This family of functions computes propensity score weights for various causal estimands:
ATE (Average Treatment Effect)
ATT (Average Treatment Effect on the Treated)
ATU (Average Treatment Effect on the Untreated, sometimes called the ATC, where the "C" stands for "control")
ATM (Average Treatment Effect for the Evenly Matchable)
ATO (Average Treatment Effect for the Overlap population)
Entropy (Average Treatment Effect for the Entropy-weighted population)
The propensity score can be provided as a numeric vector of predicted probabilities, as a
data.frame
where each column represents the predicted probability for a level of the exposure, or as a fitted GLM object. They can also be propensity score objects created byps_trim()
,ps_refit()
, orps_trunc()
The returned weights are encapsulated in a
psw
object, which is a numeric vector with additional attributes that record the estimand, and whether the weights have been stabilized, trimmed, or truncated.
Usage
wt_ate(
.propensity,
.exposure,
.sigma = NULL,
exposure_type = c("auto", "binary", "categorical", "continuous"),
.treated = NULL,
.untreated = NULL,
stabilize = FALSE,
stabilization_score = NULL,
...
)
# S3 method for class 'data.frame'
wt_ate(
.propensity,
.exposure,
.sigma = NULL,
exposure_type = c("auto", "binary", "categorical", "continuous"),
.treated = NULL,
.untreated = NULL,
stabilize = FALSE,
stabilization_score = NULL,
...,
.propensity_col = NULL
)
wt_att(
.propensity,
.exposure,
exposure_type = c("auto", "binary", "categorical"),
.treated = NULL,
.untreated = NULL,
...,
focal = NULL
)
# S3 method for class 'data.frame'
wt_att(
.propensity,
.exposure,
exposure_type = c("auto", "binary", "categorical"),
.treated = NULL,
.untreated = NULL,
...,
.propensity_col = NULL,
focal = NULL
)
wt_atu(
.propensity,
.exposure,
exposure_type = c("auto", "binary", "categorical"),
.treated = NULL,
.untreated = NULL,
...,
focal = NULL
)
# S3 method for class 'data.frame'
wt_atu(
.propensity,
.exposure,
exposure_type = c("auto", "binary", "categorical"),
.treated = NULL,
.untreated = NULL,
...,
.propensity_col = NULL,
focal = NULL
)
wt_atm(
.propensity,
.exposure,
exposure_type = c("auto", "binary", "categorical"),
.treated = NULL,
.untreated = NULL,
...
)
# S3 method for class 'data.frame'
wt_atm(
.propensity,
.exposure,
exposure_type = c("auto", "binary", "categorical"),
.treated = NULL,
.untreated = NULL,
...,
.propensity_col = NULL
)
wt_ato(
.propensity,
.exposure,
exposure_type = c("auto", "binary", "categorical"),
.treated = NULL,
.untreated = NULL,
...
)
# S3 method for class 'data.frame'
wt_ato(
.propensity,
.exposure,
exposure_type = c("auto", "binary", "categorical"),
.treated = NULL,
.untreated = NULL,
...,
.propensity_col = NULL
)
wt_entropy(
.propensity,
.exposure,
exposure_type = c("auto", "binary", "categorical"),
.treated = NULL,
.untreated = NULL,
...
)
# S3 method for class 'data.frame'
wt_entropy(
.propensity,
.exposure,
exposure_type = c("auto", "binary", "categorical"),
.treated = NULL,
.untreated = NULL,
...,
.propensity_col = NULL
)
Arguments
- .propensity
Either a numeric vector of predicted probabilities, a
data.frame
where each column corresponds to a level of the exposure, or a fitted GLM object. For data frames, the second column is used by default for binary exposures unless specified otherwise with.propensity_col
. For GLM objects, fitted values are extracted automatically.- .exposure
The exposure variable. For binary exposures, a vector of 0s and 1s; for continuous exposures, a numeric vector.
- .sigma
For continuous exposures, a numeric vector of standard errors used with
dnorm()
. For example, this can be derived from the influence measures of a model (e.g.,influence(model)$sigma
).- exposure_type
Character string specifying the type of exposure. Options are
"auto"
,"binary"
,"categorical"
, and"continuous"
. Defaults to"auto"
, which detects the type automatically.- .treated
The value representing the treatment group. If not provided, it is automatically detected.
- .untreated
The value representing the control group. If not provided, it is automatically detected.
- stabilize
Logical indicating whether to stabilize the weights. For ATE weights, stabilization multiplies the weight by either the mean of
.exposure
or the suppliedstabilization_score
. Note: stabilization is only supported for ATE and continuous exposures.- stabilization_score
Optional numeric value for stabilizing the weights (e.g., a predicted value from a regression model without predictors). Only used when
stabilize
isTRUE
.- ...
Reserved for future expansion. Not currently used.
- .propensity_col
With a binary exposure, when
.propensity
is a data frame, specifies which column to use for propensity scores. Can be a column name (quoted or unquoted) or a numeric index. Defaults to the second column if available, otherwise the first. For categorical exposures, the entire data frame is used as a matrix of propensity scores.- focal
For categorical exposures with ATT or ATU estimands, specifies the focal category. Must be one of the levels of the exposure variable. Required for
wt_att()
andwt_atu()
with categorical exposures.
Value
A psw
object (a numeric vector) with additional attributes:
estimand: A description of the estimand (e.g., "ate", "att").
stabilized: A logical flag indicating if stabilization was applied.
trimmed: A logical flag indicating if the weights are based on trimmed propensity scores.
truncated: A logical flag indicating if the weights are based on truncated propensity scores.
Details
Theoretical Background
Propensity score weighting is a method for estimating causal effects by creating a pseudo-population where the exposure is independent of measured confounders. The propensity score, \(e(X)\), is the probability of receiving treatment given observed covariates \(X\). By weighting observations inversely proportional to their propensity scores, we can balance the distribution of covariates between treatment groups. Other weights allow for different target populations.
Mathematical Formulas
Binary Exposures
For binary treatments (\(A = 0\) or \(1\)), the weights are:
ATE: \(w = \frac{A}{e(X)} + \frac{1-A}{1-e(X)}\)
ATT: \(w = A + \frac{(1-A) \cdot e(X)}{1-e(X)}\)
ATU: \(w = \frac{A \cdot (1-e(X))}{e(X)} + (1-A)\)
ATM: \(w = \frac{\min(e(X), 1-e(X))}{A \cdot e(X) + (1-A) \cdot (1-e(X))}\)
ATO: \(w = A \cdot (1-e(X)) + (1-A) \cdot e(X)\)
Entropy: \(w = \frac{h(e(X))}{A \cdot e(X) + (1-A) \cdot (1-e(X))}\), where \(h(e) = -[e \cdot \log(e) + (1-e) \cdot \log(1-e)]\)
Continuous Exposures
For continuous treatments, weights use the density ratio: \(w = \frac{f_A(A)}{f_{A|X}(A|X)}\), where \(f_A\) is the marginal density of \(A\) and \(f_{A|X}\) is the conditional density given \(X\).
Categorical Exposures
For categorical treatments with \(K\) levels, weights use a tilting function approach: \(w_i = \frac{h(e_i)}{e_{i,Z_i}}\), where \(e_{i,Z_i}\) is the propensity score for unit \(i\)'s observed treatment level, and \(h(e_i)\) is a tilting function that depends on the estimand:
ATE: \(h(e) = 1\)
ATT: \(h(e) = e_{focal}\) (propensity score for the focal category)
ATU: \(h(e) = 1 - e_{focal}\) (complement of focal category propensity)
ATM: \(h(e) = \min(e_1, ..., e_K)\)
ATO: \(h(e) = 1 / \sum_k(1/e_k)\) (reciprocal of harmonic mean denominator)
Entropy: \(h(e) = -\sum_k[e_k \cdot \log(e_k)]\) (entropy of propensity scores)
Exposure Types
The functions support different types of exposures:
binary
: For dichotomous treatments (e.g. 0/1).continuous
: For numeric exposures. Here, weights are calculated via the normal density usingdnorm()
.categorical
: For exposures with more than 2 categories. Requires.propensity
to be a matrix or data frame with columns representing propensity scores for each category.auto
: Automatically detects the exposure type based on.exposure
.
Stabilization
For ATE weights, stabilization can improve the performance of the estimator
by reducing variance. When stabilize
is TRUE
and no
stabilization_score
is provided, the weights are multiplied by the mean
of .exposure
. Alternatively, if a stabilization_score
is provided, it
is used as the multiplier. Stabilized weights have the form:
\(w_s = f_A(A) \times w\), where \(f_A(A)\) is the marginal probability or density.
Weight Properties and Diagnostics
Extreme weights can indicate:
Positivity violations (near 0 or 1 propensity scores)
Poor model specification
Lack of overlap between treatment groups
See the halfmoon package for tools to diagnose and visualize weights.
You can address extreme weights in several ways. The first is to modify the target population: use trimming, truncation, or alternative estimands (ATM, ATO, entropy). Another technique that can help is stabilization, which reduces variance of the weights.
Trimmed and Truncated Weights
In addition to the standard weight functions, versions exist for trimmed
and truncated propensity score weights created by ps_trim()
,
ps_trunc()
, and ps_refit()
. These variants calculate the weights using
modified propensity scores (trimmed or truncated) and update the estimand
attribute accordingly.
References
For detailed guidance on causal inference in R, see Causal Inference in R by Malcolm Barrett, Lucy D'Agostino McGowan, and Travis Gerke.
Foundational Papers
Rosenbaum, P. R., & Rubin, D. B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika, 70(1), 41-55.
Estimand-Specific Methods
Li, L., & Greene, T. (2013). A weighting analogue to pair matching in propensity score analysis. The International Journal of Biostatistics, 9(2), 215-234. (ATM weights)
Li, F., Morgan, K. L., & Zaslavsky, A. M. (2018). Balancing covariates via propensity score weighting. Journal of the American Statistical Association, 113(521), 390-400. (ATO weights)
Zhou, Y., Matsouaka, R. A., & Thomas, L. (2020). Propensity score weighting under limited overlap and model misspecification. Statistical Methods in Medical Research, 29(12), 3721-3756. (Entropy weights)
See also
psw()
for details on the structure of the returned weight objects.ps_trim()
,ps_trunc()
, andps_refit()
for handling extreme weights.ps_calibrate()
for calibrating weights.
Examples
## Basic Usage with Binary Exposures
# Simulate a simple dataset
set.seed(123)
n <- 100
propensity_scores <- runif(n, 0.1, 0.9)
treatment <- rbinom(n, 1, propensity_scores)
# Calculate different weight types
weights_ate <- wt_ate(propensity_scores, treatment)
#> ℹ Treating `.exposure` as binary
#> ℹ Setting treatment to `1`
weights_att <- wt_att(propensity_scores, treatment)
#> ℹ Treating `.exposure` as binary
#> ℹ Setting treatment to `1`
weights_atu <- wt_atu(propensity_scores, treatment)
#> ℹ Treating `.exposure` as binary
#> ℹ Setting treatment to `1`
weights_atm <- wt_atm(propensity_scores, treatment)
#> ℹ Treating `.exposure` as binary
#> ℹ Setting treatment to `1`
weights_ato <- wt_ato(propensity_scores, treatment)
#> ℹ Treating `.exposure` as binary
#> ℹ Setting treatment to `1`
weights_entropy <- wt_entropy(propensity_scores, treatment)
#> ℹ Treating `.exposure` as binary
#> ℹ Setting treatment to `1`
# Compare weight distributions
summary(weights_ate)
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 1.112 1.317 1.591 2.044 2.047 7.482
summary(weights_ato) # Often more stable than ATE
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 0.1005 0.2406 0.3713 0.3976 0.5113 0.8664
## Stabilized Weights
# Stabilization reduces variance
weights_ate_stab <- wt_ate(propensity_scores, treatment, stabilize = TRUE)
#> ℹ Treating `.exposure` as binary
#> ℹ Setting treatment to `1`
## Handling Extreme Propensity Scores
# Create data with positivity violations
ps_extreme <- c(0.01, 0.02, 0.98, 0.99, rep(0.5, 4))
trt_extreme <- c(0, 0, 1, 1, 0, 1, 0, 1)
# Standard ATE weights can be extreme
wt_extreme <- wt_ate(ps_extreme, trt_extreme)
#> ℹ Treating `.exposure` as binary
# Very large!
max(wt_extreme)
#> <psw{estimand = ate}[1]>
#> [1] 2
# ATO weights are bounded
wt_extreme_ato <- wt_ato(ps_extreme, trt_extreme)
#> ℹ Treating `.exposure` as binary
# Much more reasonable
max(wt_extreme_ato)
#> <psw{estimand = ato}[1]>
#> [1] 0.5
# but they target a different population
estimand(wt_extreme_ato) # "ato"
#> [1] "ato"
## Working with Data Frames
# Example with custom data frame
ps_df <- data.frame(
control = c(0.9, 0.7, 0.3, 0.1),
treated = c(0.1, 0.3, 0.7, 0.9)
)
exposure <- c(0, 0, 1, 1)
# Uses second column by default (treated probabilities)
wt_ate(ps_df, exposure)
#> ℹ Treating `.exposure` as binary
#> ℹ Treating `.exposure` as binary
#> <psw{estimand = ate}[4]>
#> [1] 1.111111 1.428571 1.428571 1.111111
# Explicitly specify column by name
wt_ate(ps_df, exposure, .propensity_col = "treated")
#> ℹ Treating `.exposure` as binary
#> ℹ Treating `.exposure` as binary
#> <psw{estimand = ate}[4]>
#> [1] 1.111111 1.428571 1.428571 1.111111
# Or by position
wt_ate(ps_df, exposure, .propensity_col = 2)
#> ℹ Treating `.exposure` as binary
#> ℹ Treating `.exposure` as binary
#> <psw{estimand = ate}[4]>
#> [1] 1.111111 1.428571 1.428571 1.111111
## Working with GLM Objects
# Fit a propensity score model
set.seed(123)
n <- 100
x1 <- rnorm(n)
x2 <- rnorm(n)
treatment <- rbinom(n, 1, plogis(0.5 * x1 + 0.3 * x2))
ps_model <- glm(treatment ~ x1 + x2, family = binomial)
# Use GLM directly for weight calculation
weights_from_glm <- wt_ate(ps_model, treatment)
#> ℹ Treating `.exposure` as binary
#> ℹ Treating `.exposure` as binary
#> ℹ Setting treatment to `1`