Statistics and Reality
Posted by PLaplace on May 2, 2010
Fellow poster Dan Draney drew my attention to this article on statistics the other day. It’s a fascinating look at what statistical significance really means and how the notion is unwittingly abused by those in the sciences, predominantly in the fields of psychology and economics. As astute observers may have inferred from my pseudonym, my interest in mathematics and statistics is more than casual, and as such the topic of abuse of statistics is dear to my heart. Moreover, in a world where billions, if not trillions, of dollars ride on the reliability of statistical models and conclusions, a strong understanding of what statistical significance means is essential.
The beauty of statistics comes not from its ability to discover objective truth, which it cannot do, but rather from its ability to quantify uncertainty. At the heart of statistics lies probability theory. This is used to assign probabilities to experimental results. If for instance, 12 out of a 1000 people suffer heart attacks when given a new drug, is that result significantly different enough from what we might expect from blind chance to justify pulling a drug from the marketplace? It turns out than when one digs deeply into the laws of probability, one discovers that it is possible to prove many many probabilistic conclusions from the simple assumption of randomness in sampling. Even if one knows absolutely nothing about an underlying population (for instance, how many people favor Barack Obama over John McCain) a properly constructed sample can yield measurable results. Hence the importance of always conducting surveys, clinical trials, and psychological experiments with completely random samples, i.e. calling random households or selecting random patients. This point is crucial, so I will repeat it: without a random sample, results mean nothing.
As concerns politics, this is rarely a problem with the major polling firms such as Gallup or Rasmussen. Their statisticians are rigorous and extremely competent. One must however always be wary of polls put out by think tanks or political advocacy groups, especially if they talk about “weighting” the result for demographics or socioeconomic status, any other number of data manipulation techniques. The instant this happens, all of the conclusions of the survey or study evaporate. Additionally, another issue is margins of error. These are critical to the quantification and understanding of statistical conclusions. If someone says that Barack Obama is leading John McCain by 49% to 47% amongst registered voters with a margin of error of .1%, that result is highly significant and I would put money on an Obama victory. If the margin of error was 3% (as is common for most polls), then we can’t really say anything about Obama’s chances of winning. To but it in betting terms, a margin of error of .1% might correlate to something like 1000:1 odds of an Obama victory, while a margin of error of 3% would be close to 1:1.
Now when it comes to predicting a presidential election, there are enough polls going around that a clear picture normally emerges, regardless of the results of any individual poll. But what about when the survey in question is only done once or twice, such as our heart attack experiment above, and billions of dollars might be on the line? In that case the difference between 10:1 odds of being right (often not enough of an outlier to justify action), and 1000:1 odds can be critical. Surveys and studies will almost always show differences between the populations in question, be they voters, medical subject, or psychology volunteers. Being able to interpret what these differences actually mean is an essential skill, not just for statisticians but for anyone who might encounter statistics on a regular basis. Given how often statistics are thrown around in academia, industry, and government, the ability to decipher statistical conclusions can often prove useful. While not everyone can devote years to the study of the mathematics behind it all, and few rigorous statistics courses are offered as part of the “standard” American college or high school education, taking some time to at least browse wikipedia might well prove invaluable down the line. At the very least, one might find oneself better equipped to correct the erroneous conclusions of those who bandy about flawed studies without knowing their error. The world can always use a little more truth.