This dataset contains 44 observations, 11 observations from 4 datasets
generated by Francis Anscombe to demonstrate that statistical summary
measures alone cannot capture the full relationship between two variables
(here, x
and y
). Anscombe emphasized the importance of visualizing data
prior to calculating summary statistics.
Format
A dataframe with 44 rows and 3 variables:
dataset
: the dataset the values come fromx
: the x-variabley
: the y-variable
Details
Dataset 1 has a linear relationship between
x
andy
Dataset 2 has shows a nonlinear relationship between
x
andy
Dataset 3 has a linear relationship between
x
andy
with a single outlierDataset 4 has shows no relationship between
x
andy
with a single outlier that serves as a high-leverage point.
In each of the datasets the following statistical summaries hold:
mean of
x
: 9variance of
x
: 11mean of
y
: 7.5variance of y: 4.125
correlation between
x
andy
: 0.816linear regression between
x
andy
:y = 3 + 0.5x
\(R^2\) for the regression: 0.67