These datasets contains 100 observations, each generated under a different data generating mechanism:
(1) A collider
(2) A confounder
(3) A mediator
(4) M-bias
Usage
causal_collider_time
causal_confounding_time
causal_mediator_time
causal_m_bias_time
causal_quartet_time
Format
causal_collider_time
: A dataframe with 100 rows and 7 variables:
covariate_baseline
: known factor measured at baselineexposure_baseline
: exposure measured at baselineoutcome_baseline
: outcome measured at baselineexposure_followup
: exposure measured at the followup visit (final time)outcome_followup
: outcome measured at the followup visit (final time)covariate_followup
: known factor measured at the followup visit (final time)
causal_confounding_time
: A dataframe with 100 rows and 7 variables:
covariate_baseline
: known factor measured at baselineexposure_baseline
: exposure measured at baselineoutcome_baseline
: outcome measured at baselineexposure_followup
: exposure measured at the followup visit (final time)outcome_followup
: outcome measured at the followup visit (final time)covariate_followup
: known factor measured at the followup visit (final time)
causal_mediator_time
: A dataframe with 100 rows and 7 variables:
covariate_baseline
: known factor measured at baselineexposure_baseline
: exposure measured at baselineoutcome_baseline
: outcome measured at baselinecovariate_mid
: known factor measured at some mid-pointexposure_mid
: exposure measured at some mid-pointoutcome_mid
: outcome measured at some mid-pointexposure_followup
: exposure measured at the followup visit (final time)outcome_followup
: outcome measured at the followup visit (final time)covariate_followup
: known factor measured at the followup visit (final time)
causal_m_bias_time
: A dataframe with 100 rows and 9 variables:
u1
: unmeasured factoru2
: unmeasured factorcovariate_baseline
: known factor measured at baselineexposure_baseline
: exposure measured at baselineoutcome_baseline
: outcome measured at baselineexposure_followup
: exposure measured at the followup visit (final time)outcome_followup
: outcome measured at the followup visit (final time)covariate_followup
: known factor measured at the followup visit (final time)
An object of class tbl_df
(inherits from tbl
, data.frame
) with 400 rows and 12 columns.
Details
There are two time points:
baseline
follow up
These datasets help demonstrate that a model that includes only pre-exposure covariates (that is, only adjusting for covariates measured at baseline), will be less prone to potential biases. Adjusting for only pre-exposure covariates "solves" the bias in datasets 1-3. It does not solve the data generated under the "M-bias" scenario, however this is more of a toy example, it has been shown many times that the assumptions needed for this M-bias to hold are often not ones we practically see in data analysis.
References
Lucy D’Agostino McGowan, Travis Gerke & Malcolm Barrett (2023) Causal inference is not just a statistics problem, Journal of Statistics and Data Science Education, DOI: 10.1080/26939169.2023.2276446
Examples
## incorrect model because covariate is post-treatment
lm(outcome_followup ~ exposure_baseline + covariate_followup,
data = causal_collider_time)
#>
#> Call:
#> lm(formula = outcome_followup ~ exposure_baseline + covariate_followup,
#> data = causal_collider_time)
#>
#> Coefficients:
#> (Intercept) exposure_baseline covariate_followup
#> -0.05656 0.41288 0.46683
#>
## correct model because covariate is pre-treatment
## even though the true mechanism dictates that the covariate is a collider,
## because the pre-exposure variable is used, the collider bias does not
## occur.
lm(outcome_followup ~ exposure_baseline + covariate_baseline,
data = causal_collider_time)
#>
#> Call:
#> lm(formula = outcome_followup ~ exposure_baseline + covariate_baseline,
#> data = causal_collider_time)
#>
#> Coefficients:
#> (Intercept) exposure_baseline covariate_baseline
#> -0.10534 1.00312 -0.01173
#>