Erica Grieder points me to this post on stats use in Psychology. Key point

When you drop the chemical on the mutant mice nerve cells, their firing rate drops, by 30%, say. With the number of mice you have this difference is statistically significant, and so unlikely to be due to chance. That’s a useful finding, which you can maybe publish. When you drop the chemical on the normal mice nerve cells, there is a bit of a drop, but not as much – let’s say 15%, which doesn’t reach statistical significance.

But here’s the catch. You can say there is a statistically significant effect for your chemical reducing the firing rate in the mutant cells. And you can say there is no such statistically significant effect in the normal cells. But you can’t say mutant and normal cells respond to the chemical differently: to say that, you would have to do a third statistical test, specifically comparing the "difference in differences", the difference between the chemical-induced change in firing rate for the normal cells against the chemical-induced change in the mutant cells.

Again this is one of those things that difficult to frame because I want to respond “Yes, but more importantly no and if you really think about ‘eh’”

Strictly speaking the author is right. You cannot say there is a statically significant difference in the response rates between mutant mice and normal mice.

However, what you can say is that the response rates of mutant mice differs significantly from zero while the response rates of non-mutant mice do not.

That clears up everything, right?

The ultimate problem – I think – is getting too much of a bug up one’s bum about the threshold of statistical significance. You did an experiment you got some evidence. That evidence alters the way you think. Its not like whoa, I discovered the next big thing if I get something with a 5% significance level but I just have a pile of poop if I get something with a 6% significance level.

However, because we concentrate on significance levels we say that the normal mice “didn’t respond”, while the mutant mice “did respond.”  That sounds like you are talking about a fundamental difference in the mice. And, since you are talking about a fundamental difference in the mice you ought to be able to say the mice are fundamentally different, right?

Well, no because its an artifact of our significance cut off. That we use this cut off is a problem.

However, doing the “difference-in-differences” stat doesn’t really help over come that because you have just applied the same falsely rigid standard to another measurement.

Indeed, one can imagine the following scenario. There are three types of mice: Normal, Mutant and All Fucked Up (AFU).  The AFU mice are some ugly creatures and you really don’t want to get them mad. But, we’ll analyze their data anyway.

So using the numbers from the post: The normal mice see their firing rates drop by 15%. The mutant mice see their firing rates drop by 30%. And the AFU mice see their firing rates drop by 45%. Plus they turn green and eat the lab assistants! What kids won’t do for an RAship these days?

Now, lets do difference-in-difference between normal and mutant. It fails, so we can’t say they are different.

Now, lets do difference-in-difference between mutant and AFU. It fails so we can’t say they are different.

Now, lets do difference-in-difference between normal and AFU. It passes. Woo-hoo we get our paper published. Too bad Sanjay got eaten before he got his first co-authorship. C’est la vie.

However, look at what we are saying. We can’t say that normals respond at all to the chemical. We can’t say that normals and mutants respond differently to the chemical. And, we can’t say that mutants and AFUs respond differently.

So are mutants more like normals or AFUs? We can’t say because they are not significantly different from either. However, normals and AFUs are significantly different from each other. And, mutants and AFUs share something in common, they both respond significantly to the chemical. Whereas, normals do not.

Its like the stats are telling you nothing and everything all at the same time. That’s because they have arbitrary cut-off points in them. If you get wrapped up in the cut-off points you will be chasing your tail. If you accept that the cutoff points are arbitrary then you can make sense of the world.

You can look at the data on normal mice. You can look at the data on mutant mice. Then you might say well, that normal mice data really looks like chance. And, that mutant data really looks like there is something going on here. So I am going to tentatively say that mutants are different than normals in their response to  the chemical.

But, its all shades of gray that push our beliefs in one direction or another. There is no meaningful definitive cut off that says yes there is an effect or no there is not.