Skip to contents

This dataset contains 44 observations, 11 observations from 4 datasets generated by Francis Anscombe to demonstrate that statistical summary measures alone cannot capture the full relationship between two variables (here, x and y). Anscombe emphasized the importance of visualizing data prior to calculating summary statistics.

Usage

anscombe_quartet

Format

A dataframe with 44 rows and 3 variables:

  • dataset: the dataset the values come from

  • x: the x-variable

  • y: the y-variable

Details

  • Dataset 1 has a linear relationship between x and y

  • Dataset 2 has shows a nonlinear relationship between x and y

  • Dataset 3 has a linear relationship between x and y with a single outlier

  • Dataset 4 has shows no relationship between x and y with a single outlier that serves as a high-leverage point.

In each of the datasets the following statistical summaries hold:

  • mean of x: 9

  • variance of x: 11

  • mean of y: 7.5

  • variance of y: 4.125

  • correlation between x and y: 0.816

  • linear regression between x and y: y = 3 + 0.5x

  • \(R^2\) for the regression: 0.67

References

Anscombe, F. J. (1973). "Graphs in Statistical Analysis". American Statistician. 27 (1): 17–21. doi:10.1080/00031305.1973.10478966. JSTOR 2682899.