tidy_smd() calculates the standardized mean difference (SMD) for variables in a dataset between groups. Optionally, you may also calculate weighted SMDs. tidy_smd() wraps smd::smd(), returning a tidy dataframe with the columns variable, method, and smd, as well as fourth column the contains the level of .group the SMD represents. You may also supply multiple weights to calculate multiple weighted SMDs, useful when comparing different types of weights. Additionally, the .wts argument supports matched datasets where the variable supplied to .wts is an binary variable indicating whether the row was included in the match. If you're using MatchIt, the helper function bind_matches() will bind these indicators to the original dataset, making it easier to compare across matching specifications.

tidy_smd(
  .df,
  .vars,
  .group,
  .wts = NULL,
  include_observed = TRUE,
  include_unweighted = NULL,
  na.rm = FALSE,
  gref = 1L,
  std.error = FALSE,
  make_dummy_vars = FALSE
)

Arguments

.df

A data frame

.vars

Variables for which to calculate SMD

.group

Grouping variable

.wts

Variables to use for weighting the SMD calculation. These can be, for instance, propensity score weights or a binary indicator signaling whether or not a participant was included in a matching algorithm.

include_observed

Logical. If using .wts, also calculate the unweighted SMD?

include_unweighted

Deprecated. Please use include_observed.

na.rm

Remove NA values from x? Defaults to FALSE.

gref

an integer indicating which level of g to use as the reference group. Defaults to 1.

std.error

Logical indicator for computing standard errors using compute_smd_var. Defaults to FALSE.

make_dummy_vars

Logical. Transform categorical variables to dummy variables using model.matrix()? By default, smd::smd uses a summary value based on the Mahalanobis distance distance to approximate the SMD of categorical variables. An alternative approach is to transform categorical variables to a set of dummy variables.

Value

a tibble

Examples


tidy_smd(nhefs_weights, c(age, education, race), .group = qsmk)
#> # A tibble: 3 × 4
#>   variable  method   qsmk     smd
#>   <chr>     <chr>    <chr>  <dbl>
#> 1 age       observed 1     -0.282
#> 2 education observed 1      0.196
#> 3 race      observed 1      0.177
tidy_smd(nhefs_weights, c(age, education), .group = qsmk, std.error = TRUE)
#> # A tibble: 2 × 5
#>   variable  method   qsmk     smd std.error
#>   <chr>     <chr>    <chr>  <dbl>     <dbl>
#> 1 age       observed 1     -0.282    0.0580
#> 2 education observed 1      0.196    0.0579

tidy_smd(
  nhefs_weights,
  c(age, race, education),
  .group = qsmk,
  .wts = c(w_ate, w_att, w_atm)
)
#> # A tibble: 12 × 4
#>    variable  method   qsmk       smd
#>    <chr>     <chr>    <chr>    <dbl>
#>  1 age       observed 1     -0.282  
#>  2 race      observed 1      0.177  
#>  3 education observed 1      0.196  
#>  4 age       w_ate    1     -0.00585
#>  5 race      w_ate    1      0.00703
#>  6 education w_ate    1      0.0347 
#>  7 age       w_att    1     -0.0122 
#>  8 race      w_att    1      0.00325
#>  9 education w_att    1      0.0289 
#> 10 age       w_atm    1     -0.00187
#> 11 race      w_atm    1      0.00102
#> 12 education w_atm    1      0.00961