Computes the effective sample size (ESS) for one or more weighting schemes, optionally stratified by treatment groups. ESS reflects how many observations you would have if all were equally weighted.
Usage
check_ess(
.data,
.weights = NULL,
.group = NULL,
include_observed = TRUE,
n_tiles = 4,
tile_labels = NULL
)
Arguments
- .data
A data frame containing the variables to analyze.
- .weights
Optional weighting variables. Can be unquoted variable names, a character vector, or NULL. Multiple weights can be provided to compare different weighting schemes.
- .group
Optional grouping variable. When provided, ESS is calculated separately for each group level. For continuous variables, groups are created using quantiles.
- include_observed
Logical. If using
.weights
, also calculate observed (unweighted) metrics? Defaults to TRUE.- n_tiles
For continuous
.group
variables, the number of quantile groups to create. Default is 4 (quartiles).- tile_labels
Optional character vector of labels for the quantile groups when
.group
is continuous. If NULL, uses "Q1", "Q2", etc.
Value
A tibble with columns:
- method
Character. The weighting method ("observed" or weight variable name).
- group
Character. The group level (if
.group
is provided).- n
Integer. The number of observations in the group.
- ess
Numeric. The effective sample size.
- ess_pct
Numeric. ESS as a percentage of the actual sample size.
Details
The effective sample size (ESS) is calculated using the classical formula: \(ESS = (\sum w)^2 / \sum(w^2)\).
When weights vary substantially, the ESS can be much smaller than the actual number of observations, indicating that a few observations carry disproportionately large weights.
When .group
is provided, ESS is calculated separately for each group level:
For binary/categorical exposures: ESS is computed within each treatment level
For continuous exposures: The variable is divided into quantiles (using
dplyr::ntile()
) and ESS is computed within each quantile
The function returns results in a tidy format suitable for plotting or further analysis.
See also
ess()
for the underlying ESS calculation, plot_ess()
for visualization
Other balance functions:
bal_corr()
,
bal_ess()
,
bal_ks()
,
bal_model_auc()
,
bal_model_roc_curve()
,
bal_qq()
,
bal_smd()
,
bal_vr()
,
check_balance()
,
check_model_auc()
,
check_model_roc_curve()
,
check_qq()
,
plot_balance()
Examples
# Overall ESS for different weighting schemes
check_ess(nhefs_weights, .weights = c(w_ate, w_att, w_atm))
#> # A tibble: 4 × 4
#> method n ess ess_pct
#> <chr> <int> <dbl> <dbl>
#> 1 observed 1566 1566 100
#> 2 w_ate 1566 1013. 64.7
#> 3 w_atm 1566 1089. 69.5
#> 4 w_att 1566 1070. 68.3
# ESS by treatment group (binary exposure)
check_ess(nhefs_weights, .weights = c(w_ate, w_att), .group = qsmk)
#> # A tibble: 6 × 5
#> method group n ess ess_pct
#> <chr> <fct> <int> <dbl> <dbl>
#> 1 observed 0 1163 1163 100
#> 2 observed 1 403 403 100
#> 3 w_ate 0 1163 1129. 97.0
#> 4 w_ate 1 403 326. 80.9
#> 5 w_att 0 1163 796. 68.4
#> 6 w_att 1 403 403 100
# ESS by treatment group (categorical exposure)
check_ess(nhefs_weights, .weights = w_cat_ate, .group = alcoholfreq_cat)
#> # A tibble: 12 × 5
#> method group n ess ess_pct
#> <chr> <fct> <int> <dbl> <dbl>
#> 1 observed none 195 195 100
#> 2 observed lt_12_per_year 328 328 100
#> 3 observed 1_4_per_month 494 494 100
#> 4 observed 2_3_per_week 219 219 100
#> 5 observed daily 325 325 100
#> 6 observed NA 5 5 100
#> 7 w_cat_ate none 195 98.4 50.4
#> 8 w_cat_ate lt_12_per_year 328 251. 76.7
#> 9 w_cat_ate 1_4_per_month 494 452. 91.6
#> 10 w_cat_ate 2_3_per_week 219 181. 82.8
#> 11 w_cat_ate daily 325 246. 75.7
#> 12 w_cat_ate NA 5 NaN NaN
# ESS by quartiles of a continuous variable
check_ess(nhefs_weights, .weights = w_ate, .group = age, n_tiles = 4)
#> # A tibble: 8 × 5
#> method group n ess ess_pct
#> <chr> <fct> <int> <dbl> <dbl>
#> 1 observed Q1 392 392 100
#> 2 observed Q2 392 392 100
#> 3 observed Q3 391 391 100
#> 4 observed Q4 391 391 100
#> 5 w_ate Q1 392 212. 54.1
#> 6 w_ate Q2 392 230. 58.6
#> 7 w_ate Q3 391 281. 71.9
#> 8 w_ate Q4 391 317. 81.0
# Custom labels for continuous groups
check_ess(nhefs_weights, .weights = w_ate, .group = age,
n_tiles = 3, tile_labels = c("Young", "Middle", "Older"))
#> # A tibble: 6 × 5
#> method group n ess ess_pct
#> <chr> <fct> <int> <dbl> <dbl>
#> 1 observed Young 522 522 100
#> 2 observed Middle 522 522 100
#> 3 observed Older 522 522 100
#> 4 w_ate Young 522 293. 56.1
#> 5 w_ate Middle 522 326. 62.4
#> 6 w_ate Older 522 417. 80.0
# Without unweighted comparison
check_ess(nhefs_weights, .weights = w_ate, .group = qsmk,
include_observed = FALSE)
#> # A tibble: 2 × 5
#> method group n ess ess_pct
#> <chr> <fct> <int> <dbl> <dbl>
#> 1 w_ate 0 1163 1129. 97.0
#> 2 w_ate 1 403 326. 80.9