## Statistics and Reality

Posted by PLaplace on May 2, 2010

Fellow poster Dan Draney drew my attention to this article on statistics the other day. It’s a fascinating look at what statistical significance really means and how the notion is unwittingly abused by those in the sciences, predominantly in the fields of psychology and economics. As astute observers may have inferred from my pseudonym, my interest in mathematics and statistics is more than casual, and as such the topic of abuse of statistics is dear to my heart. Moreover, in a world where billions, if not trillions, of dollars ride on the reliability of statistical models and conclusions, a strong understanding of what statistical significance means is essential.

The beauty of statistics comes not from its ability to discover objective truth, which it cannot do, but rather from its ability to quantify uncertainty. At the heart of statistics lies probability theory. This is used to assign probabilities to experimental results. If for instance, 12 out of a 1000 people suffer heart attacks when given a new drug, is that result significantly different enough from what we might expect from blind chance to justify pulling a drug from the marketplace? It turns out than when one digs deeply into the laws of probability, one discovers that it is possible to prove many many probabilistic conclusions from the simple assumption of randomness in sampling. Even if one knows absolutely nothing about an underlying population (for instance, how many people favor Barack Obama over John McCain) a properly constructed sample can yield measurable results. Hence the importance of always conducting surveys, clinical trials, and psychological experiments with completely random samples, i.e. calling random households or selecting random patients. This point is crucial, so I will repeat it: without a random sample, results mean nothing.

As concerns politics, this is rarely a problem with the major polling firms such as Gallup or Rasmussen. Their statisticians are rigorous and extremely competent. One must however always be wary of polls put out by think tanks or political advocacy groups, especially if they talk about “weighting” the result for demographics or socioeconomic status, any other number of data manipulation techniques. The instant this happens, all of the conclusions of the survey or study evaporate. Additionally, another issue is margins of error. These are critical to the quantification and understanding of statistical conclusions. If someone says that Barack Obama is leading John McCain by 49% to 47% amongst registered voters with a margin of error of .1%, that result is highly significant and I would put money on an Obama victory. If the margin of error was 3% (as is common for most polls), then we can’t really say anything about Obama’s chances of winning. To but it in betting terms, a margin of error of .1% might correlate to something like 1000:1 odds of an Obama victory, while a margin of error of 3% would be close to 1:1.

Now when it comes to predicting a presidential election, there are enough polls going around that a clear picture normally emerges, regardless of the results of any individual poll. But what about when the survey in question is only done once or twice, such as our heart attack experiment above, and billions of dollars might be on the line? In that case the difference between 10:1 odds of being right (often not enough of an outlier to justify action), and 1000:1 odds can be critical. Surveys and studies will almost always show differences between the populations in question, be they voters, medical subject, or psychology volunteers. Being able to interpret what these differences actually mean is an essential skill, not just for statisticians but for anyone who might encounter statistics on a regular basis. Given how often statistics are thrown around in academia, industry, and government, the ability to decipher statistical conclusions can often prove useful. While not everyone can devote years to the study of the mathematics behind it all, and few rigorous statistics courses are offered as part of the “standard” American college or high school education, taking some time to at least browse wikipedia might well prove invaluable down the line. At the very least, one might find oneself better equipped to correct the erroneous conclusions of those who bandy about flawed studies without knowing their error. The world can always use a little more truth.

## Dan Draney said

I’m still not sure I know what “margin of error” means in a poll. Is +/- the MoE a 95% confidence interval? If so, does that just mean that if the same poll were given many times with the same methodology that 95% of the results would fall within that range?

## PLaplace said

Polls normally use a 95% confidence interval to derive their +/- 3 margin of error (the sample size is usually set up to give this result ahead of time, though larger samples will result in smaller margins of error). The easiest way to think about what a margin of error and confidence level mean is that there is a 95% chance the poll was done “correctly” and the margin of error captured the true proportion within its bounds. If for instance support for a bill came in at 47% with +/- 3 points margin of error and 95% confidence, then there is a 95% chance that our poll captured the true proportion in the interval between 44% to 51%. Equivalently, one could say that there is a 5% chance that our poll is completely off, and that the true proportion is not between 44% and 51%. To put it visually, one can think of the margin of error as a net, and the confidence level the chance that we caught our fish (the true proportion) within that net.

## Dan Draney said

I think it must actually be worse than that, though. For example, the questions themselves may introduce a bias towards one answer due to wording, or even due to the order they are asked.

Also, aren’t they

assumingthere is no sampling bias, and that only random aspects of the sample affect the results?While a well-designed, professional poll will guard against those problems, they can’t be eliminated completely in all cases. For example, certain groups may self-select disproportionally to participate in an exit poll. People responding to a telephone poll may allow their answers to be influenced by what they think the questioner wants to hear.

These kinds of things can skew the sample (and hence the results) in decidedly non-random way that is next to impossible to detect, and defies “corrections” for demographics and the like.

That’s why I stated it this way in my original comment: “…if the same poll were given many times with the same methodology that 95% of the results would fall within that range?”

## PLaplace said

Completely agreed. Question bias and self-selection are the downfall of many polls. Statistical techniques always assume that this has been weeded out ahead of time, and that the sample represents a truly random set of observations. And you are correct, if bias has been introduced, there is no way to “correct” it out of the sample. A professionally designed poll, though it can’t completely remove all bias, can reduce bias to little more than background noise; it’s still there, but its effects are negligible. I should stress again though, that it takes a very very strong poll or experimental design to do this.

As to your question about “95% of results would fall within that range,” that is very close to true, but not strictly true. It works something like this: we start from the assumption that there exists a true proportion of the population that is going to vote for candidate A, and we call that proportion P. Now when we take a random sample of the population for polling, the proportion of the sample who say they’ll vote for A is theoretically randomly distributed. We may phone all A supporters, or we may phone all B supporters. Since it’s random, we can’t tell for sure ahead of time. What we can do however, is calculate how our sample is likely to be distributed. By the laws of probability, it’s likely that if we take a truly random sample of sufficient size our sample proportion will be extremely close to the true proportion P (this is roughly what the Weak Law of Large Numbers states). Our margin of error is derived from these laws. Because we can derive how our sample’s answers and proportion will be distributed, we can calculate a margin of error with 95% confidence based on the theoretical true proportion. This is how the +/- 3 points is derived. That +/- 3 points, along with the sample size, is computed so that 95% of the time our sample proportion, the response from our poll, will be within 3 points of the true proportion within the population.

To summarize (Or to channel Obama, “Let me be clear”), it’s not that 95% of our results will fall in the range that the poll states, but that 95% of the time the poll will fall within the true range around the true proportion, i.e. 95% of the time the poll will be between P-3 and P+3.

## Dan Draney said

OK. That makes sense. Then they just have to figure out who’s likely to actually vote when the time comes. :)

I encountered a real person who swore that the election results were faked, based on the fact that they didn’t match the exit poll numbers. She couldn’t (or wouldn’t) grasp that the exit poll is an attempt to predict the election result, and not meant to replace the result itself.

## PLaplace said

To that I only have one thing to say.

*Facepalm*