Skip to contents

This dataset contains 11 observations generated by Francis Anscombe to demonstrate that statistical summary measures alone cannot capture the full relationship between two variables (here, x and y). Anscombe emphasized the importance of visualizing data prior to calculating summary statistics.

Usage

anscombe_outlier

Format

A dataframe with 11 rows and 2 variables:

  • x: the x-variable

  • y: the y-variable

Details

This Dataset has a linear relationship between x and y with a single outlier

Additionally, the following statistical summaries hold:

  • mean of x: 9

  • variance of x: 11

  • mean of y: 7.5

  • variance of y: 4.125

  • correlation between x and y: 0.816

  • linear regression between x and y: y = 3 + 0.5x

  • \(R^2\) for the regression: 0.67

References

Anscombe, F. J. (1973). "Graphs in Statistical Analysis". American Statistician. 27 (1): 17–21. doi:10.1080/00031305.1973.10478966. JSTOR 2682899.