Truncate (Winsorize) Propensity Scores

ps_trunc() sets out‐of‐range propensity scores to fixed bounding values (a form of winsorizing). This is an alternative to ps_trim(), which removes (sets NA) instead of bounding and is then refit with ps_refit()

Usage

ps_trunc(
  ps,
  method = c("ps", "pctl", "cr"),
  lower = NULL,
  upper = NULL,
  .exposure = NULL,
  .treated = NULL,
  .untreated = NULL,
  ...
)

Arguments

ps

The propensity score, either a numeric vector between 0 and 1 for binary exposures, or a matrix/data.frame where each column represents propensity scores for each level of a categorical exposure.

method

One of "ps", "pctl", or "cr".

"ps": directly cut on [lower, upper] of ps. For categorical, uses symmetric truncation with lower as the threshold.
"pctl": use quantiles of ps as bounding values. For categorical, calculates quantiles across all propensity score values.
"cr": the common range of ps given .exposure, bounding [min(ps[treated]), max(ps[untreated])] (binary only)

lower, upper

Numeric or quantile bounds. If NULL, defaults vary by method. For categorical exposures with method "ps", lower represents the truncation threshold (delta).

.exposure

For method "cr", a binary exposure vector. For categorical exposures, must be a factor or character vector.

.treated

The value representing the treatment group. If not provided, it is automatically detected.

.untreated

The value representing the control group. If not provided, it is automatically detected.

...

Additional arguments passed to methods

Value

A ps_trunc object (numeric vector or matrix). It has an attribute ps_trunc_meta storing fields like method, lower_bound, and upper_bound.

Details

For binary exposures with each \(ps[i]\):

If \(ps[i] < lower\_bound\), we set \(ps[i] = lower\_bound\).
If \(ps[i] > upper\_bound\), we set \(ps[i] = upper\_bound\).

For categorical exposures:

Each value below the threshold is set to the threshold
Rows are renormalized to sum to 1

This approach is often called winsorizing.

Examples

set.seed(2)
n <- 30
x <- rnorm(n)
z <- rbinom(n, 1, plogis(0.4 * x))
fit <- glm(z ~ x, family = binomial)
ps <- predict(fit, type = "response")

# truncate just the 99th percentile
ps_trunc(ps, method = "pctl", lower = 0, upper = .99)
#> <ps_trunc{[0.341443426776033,0.805793268892769], method=pctl}[30]>
#>         1         2         3         4         5         6         7         8 
#> 0.5149714 0.6361298 0.7694837 0.4880712 0.6073989 0.6305169 0.6899234 0.5897388 
#>         9        10        11        12        13        14        15        16 
#> 0.8003122 0.6009455 0.6605909 0.7162599 0.5725720 0.4985231 0.7849940 0.3561684 
#>        17        18        19        20        21        22        23        24 
#> 0.7064972 0.6200818 0.7191624 0.6620999 0.8057933 0.4800637 0.7696302 0.7981060 
#>        25        26        27        28        29        30 
#> 0.6167236 0.3414434 0.6667225 0.5494305 0.6981704 0.6472363

Usage

Arguments

Value

Details

See also

Examples