tidy_smd()
calculates the standardized mean difference (SMD) for variables
in a dataset between groups. Optionally, you may also calculate weighted
SMDs. tidy_smd()
wraps smd::smd()
, returning a tidy dataframe with the
columns variable
, method
, and smd
, as well as fourth column the
contains the level of .group
the SMD represents. You may also supply
multiple weights to calculate multiple weighted SMDs, useful when comparing
different types of weights. Additionally, the .wts
argument supports
matched datasets where the variable supplied to .wts
is an binary variable
indicating whether the row was included in the match. If you're using
MatchIt, the helper function bind_matches()
will bind these indicators to
the original dataset, making it easier to compare across matching
specifications.
tidy_smd(
.df,
.vars,
.group,
.wts = NULL,
include_observed = TRUE,
include_unweighted = NULL,
na.rm = FALSE,
gref = 1L,
std.error = FALSE,
make_dummy_vars = FALSE
)
A data frame
Variables for which to calculate SMD
Grouping variable
Variables to use for weighting the SMD calculation. These can be, for instance, propensity score weights or a binary indicator signaling whether or not a participant was included in a matching algorithm.
Logical. If using .wts
, also calculate the
unweighted SMD?
Deprecated. Please use include_observed
.
Remove NA
values from x
? Defaults to FALSE
.
an integer indicating which level of g
to use as the reference
group. Defaults to 1
.
Logical indicator for computing standard errors using
compute_smd_var
. Defaults to FALSE
.
Logical. Transform categorical variables to dummy
variables using model.matrix()
? By default, smd::smd uses a summary
value based on the Mahalanobis distance distance to approximate the SMD of
categorical variables. An alternative approach is to transform categorical
variables to a set of dummy variables.
a tibble
tidy_smd(nhefs_weights, c(age, education, race), .group = qsmk)
#> # A tibble: 3 × 4
#> variable method qsmk smd
#> <chr> <chr> <chr> <dbl>
#> 1 age observed 1 -0.282
#> 2 education observed 1 0.196
#> 3 race observed 1 0.177
tidy_smd(nhefs_weights, c(age, education), .group = qsmk, std.error = TRUE)
#> # A tibble: 2 × 5
#> variable method qsmk smd std.error
#> <chr> <chr> <chr> <dbl> <dbl>
#> 1 age observed 1 -0.282 0.0580
#> 2 education observed 1 0.196 0.0579
tidy_smd(
nhefs_weights,
c(age, race, education),
.group = qsmk,
.wts = c(w_ate, w_att, w_atm)
)
#> # A tibble: 12 × 4
#> variable method qsmk smd
#> <chr> <chr> <chr> <dbl>
#> 1 age observed 1 -0.282
#> 2 race observed 1 0.177
#> 3 education observed 1 0.196
#> 4 age w_ate 1 -0.00585
#> 5 race w_ate 1 0.00664
#> 6 education w_ate 1 0.0347
#> 7 age w_att 1 -0.0120
#> 8 race w_att 1 0.00365
#> 9 education w_att 1 0.0267
#> 10 age w_atm 1 -0.00184
#> 11 race w_atm 1 0.00113
#> 12 education w_atm 1 0.00934