Tidy Standardized Mean Differences — tidy

tidy_smd() calculates the standardized mean difference (SMD) for variables in a dataset between groups. Optionally, you may also calculate weighted SMDs. tidy_smd() wraps smd::smd(), returning a tidy dataframe with the columns variable, method, and smd, as well as fourth column the contains the level of .group the SMD represents. You may also supply multiple weights to calculate multiple weighted SMDs, useful when comparing different types of weights. Additionally, the .wts argument supports matched datasets where the variable supplied to .wts is an binary variable indicating whether the row was included in the match. If you're using MatchIt, the helper function bind_matches() will bind these indicators to the original dataset, making it easier to compare across matching specifications.

tidy_smd(
  .df,
  .vars,
  .group,
  .wts = NULL,
  include_observed = TRUE,
  include_unweighted = NULL,
  na.rm = FALSE,
  gref = 1L,
  std.error = FALSE,
  make_dummy_vars = FALSE
)

Arguments

.df: A data frame
.vars: Variables for which to calculate SMD. Can be unquoted (x) or quoted ("x").
.group: Grouping variable. Can be unquoted (x) or quoted ("x").
.wts: Variables to use for weighting the SMD calculation. These can be, for instance, propensity score weights or a binary indicator signaling whether or not a participant was included in a matching algorithm. Can be unquoted (x) or quoted ("x").
include_observed: Logical. If using .wts, also calculate the unweighted SMD?
include_unweighted: Deprecated. Please use include_observed.
na.rm: Remove NA values from x? Defaults to FALSE.
gref: an integer indicating which level of g to use as the reference group. Defaults to 1.
std.error: Logical indicator for computing standard errors using compute_smd_var. Defaults to FALSE.
make_dummy_vars: Logical. Transform categorical variables to dummy variables using model.matrix()? By default, smd::smd uses a summary value based on the Mahalanobis distance distance to approximate the SMD of categorical variables. An alternative approach is to transform categorical variables to a set of dummy variables.

Value

a tibble

Examples


tidy_smd(nhefs_weights, c(age, education, race), .group = qsmk)
#> # A tibble: 3 × 4
#>   variable  method   qsmk     smd
#>   <chr>     <chr>    <chr>  <dbl>
#> 1 age       observed 1     -0.282
#> 2 education observed 1      0.196
#> 3 race      observed 1      0.177
tidy_smd(nhefs_weights, c(age, education), .group = qsmk, std.error = TRUE)
#> # A tibble: 2 × 5
#>   variable  method   qsmk     smd std.error
#>   <chr>     <chr>    <chr>  <dbl>     <dbl>
#> 1 age       observed 1     -0.282    0.0580
#> 2 education observed 1      0.196    0.0579

tidy_smd(
  nhefs_weights,
  c(age, race, education),
  .group = qsmk,
  .wts = c(w_ate, w_att, w_atm)
)
#> Error in map(.x, .f, ...): ℹ In index: 1.
#> Caused by error in `dplyr::reframe()`:
#> ℹ In argument: `dplyr::across(...)`.
#> Caused by error in `across()`:
#> ! Can't compute column `age`.
#> Caused by error:
#> ! error in evaluating the argument 'w' in selecting a method for function 'smd': Can't convert `x` <psw> to <double>.