Simple Statistics Provide Poor Prognosis for Science!
An article was published in 2006 called Why most published research findings are false(1). I have skim read it a few times, but recently I put the time in to read it thoroughly. It basically goes through a simple statistical analysis to suggest that even if all of research was performed using the absolute best scientific practices most of the published research would be false. Now the article uses statistical jargon that could make it hard to understand but it boils down to criticizing the statistical output known as a P-Value. Now a P-Value is basically the probability that any difference between the groups of the study occurred by chance. Now a result is seen as significant if the P-Value is below 0.05 (or a 5% probability that a difference occurred by chance). This value is a predetermined level and is referred to as Alpha. Now another predetermined levels is the probability of finding that the groups are the same when they are truly different. This is normally set (when it is set, which is rare) at 0.8 (or 80%). Both these stats are calculated from the size of the difference between the groups and the variation of the groups. So when there is a big difference between two groups and the within group variation is low, the P-Value (probability it occurred by chance) will be low.
Now there is a common lesson taught in undergraduate pharmacology and medical degrees about how robust medical tests are and it normally goes like this; imagine a disease which occurs 1 in every 10,000 people and the test is highly sensitive, with a 99.999% chance of detecting the disease if you have it. The test is also pretty specific with a false positive rate of 0.1%. Now imagine if we screened 10,000 people and your test comes back positive (saying you have the disease), what are your chances that you actually have the disease? You might say well it is very sensitive so 99.999% right? Well one person of the 10,000 actually has the disease and they will most likely (99.999%) test positive for the disease, but there are also the false positives (0.1% X 10,000=10), so 11 people will be found positive and only 1 of them actually has the disease. So your chances are ~9% of having the disease if you are found to be positive in this scenario.
So why am I talking about disease tests? Well we can repeat this analysis for scientific research. In this case the false positive rate is Alpha (normally 5%) and the false negative rate is 1-Beta (normally 20%). The actual rate of say discovering a cure is unknown but lets generously say that 1 in every 100 compounds can effectively treat your disease. Then if we were to test 10,000 compounds and perform the calculation from above we would find 100 true cures, but of these we would only detect 80 because our Beta is normally set to 80%. So we’d have 80 confirmed cures and 20 false negatives. Now the false positive rate would be equal to Alpha 5%, which means purely by chance we would find 500 compounds to be effective for the disease despite not being so. With these back of a napkin calculations only 13.5% of all scientific discoveries may be real. Obviously, there are a number of short comings in this analysis but it gives scientists a bit of reason to worry. In fact there is now at least a ream of articles on my desk that criticize the use of P-Values and it is becoming increasingly clear that trying to reduce the complex world of biology into three digits probabilities was probably a bad idea. But the ease at which the P-Value can be interpreted and applied has made it universal in science. This meme may be hard to shake.
(1) Ioannidis JPA (2005). Why most published research findings are false. PLoS Medicine 2:696-701.