Tim Slekar at the Huffington Post tries to give readers a lecture in “stats 101″ and ends up making an elementary error himself. Tim writes:

For those that forgot stat 101, correlation does not mean causation. Example: If you take a sample of people involved in automobile accidents on their way to work and ask the sample if they had breakfast and then checked the correlation between eating breakfast and automobile accidents — it would be through the roof. But what does this mean? Nothing from a cause and effect stand point. Eating breakfast does not cause car accidents — period. Remember, correlation does not mean causation. However, this simple statistical rule doesn’t seem to matter to the reformers.

If you took a sample of people involved in accidents and looked at the correlation between eating breakfast and being in an accident it wouldn’t be “through the roof”, it would be undefined. Let X be a dummy variable being 1 if someone ate breakfast and 0 if they didn’t, and let Y be a dummy variable equal to 1 if someone is involved in an accident, and 0 if they weren’t. A dataset like Tim is talking about might look like this:

X ,Y

1,1

0,1

1,1

0,1

0,1

1,1

1,1

1,1

1,1

Notice that all of the Ys are equal to one, because Tim said we are sampling people who have been involved in an accident. The problem is, as they teach you in stats 101, the formula for correlation is cov(X,Y)/std_dev(X)*std_dev(Y), where cov is covariance, and std_dev is the standard deviation. The standard deviation of a number that is always equal to 1 is zero. And so the correlation has a zero in the denominator, which is undefined, not “through the roof”.

About these ads