Friday, 17 April 2015

Not new points about multiple hypothesis testing

Say your paper has four independent tests of your hypotheses. Suppose that in fact all your hypotheses are false. Then, under the null, your p-values are uniformly distributed between 0 and 1. What's the chance of getting at least one result at 5% significance?

We can answer this with a one-liner in R:

table(replicate(100000, all(runif(4)>0.05)))

18462 81538

You'll get a significance star about 18% of the time.

What about if you have eight hypotheses?

> table(replicate(100000, all(runif(8)>0.05)))

33663 66337

About one third of the time.

Multiple hypotheses really affect your p values even if you just test a few hypotheses. This is not just a problem for people using genetic data and running millions of tests! But you almost never see a paper which corrects p values for multiple hypotheses. Perhaps this should change.