Saturday, December 02, 2006

Statistical Shenanigans

I had just barely begun surfin' the 'net this morning when a headline grabbed my attention:

Americans Drive Less for First Time in 25 Years

Intrigued, I followed the link and read the article. The headline is based on the following statistic:

  • Average miles per driver in 2004: 13,711
  • Average miles per driver in 2005: 13,657

For those of you who may be, as I am, mathematically challenged, that is a drop of 54 miles driven from one year to the next. WOW! Tremendous! Surely this is indicative of a major shift in gas consumption and driving patterns!

Now, Dave can tell you that I am no statistician. I sweated my way through my mandatory statistics course in graduate school and worked harder in that course than in just about any other course I ever took. But even I know that a drop of 4/10 of 1 percent is not statistically significant. Which brings me to one of Dave's and my pet peeves: misuse of statistics.

Every day, one can scan newspapers, web sites, blogs, books, magazine and journal articles (even academic ones - remember the famous Bell Curve controversy of the 90s, stimulated by the faulty research of two Harvard professors?) and observe the persistent mauling, manhandling, misuse, abuse and slaughter of statistics. I think there are two reasons why people routinely get away with so many statistical shenanigans.

First, people are easily impressed with numbers. If something can be counted, weighed, measured and illustrated with a bar graph, we love it. Speaking philosophically for a moment, I think this impressionability is rooted in a deep-seated human desire for control of our environments - social, physical, professional, etc. We believe that if we can name it and quantify it, we can gain mastery over it. Some semblance of control is psychologically necessary if we are to function in a chaotic, precarious world in which, frankly, most of what happens around us and to us is beyond our control.

Second, even though people are impressed with number play (especially graphs), most of them have little understanding of how statistics actually work, of how they can be used fruitfully and how they can be misused deceitfully (or even maliciously). Thus, it is easy for researchers, writers, journalists, etc., to express something numerically or graphically so that it sure looks and sounds authoritative, so, golly gee, it's gotta be right!

Do you need to take a post-secondary course in math to see through this stuff? Probably not. You learned enough in high school math to crunch most of the everyday statistical numbers you encounter for yourself. And if you took a post-secondary math course, you're all set. Do you need to take a post-secondary course in logic to see through this stuff? That would actually be more useful than a math course, but it's probably not necessary. The most important thing I learned in graduate school is that statistics are more closely related to logic than to math. So, if you see some numbers that don't seem to "make sense" to you, do some basic math to check the figures for yourself. Once you've done that, examine the figures logically to determine if there are any other interpretations that can be applied to the same figures. Then you'll be able to determine if the numbers you are reading are lies, damned lies, or statistics.


Dave said...

Misuse of statistics is an indication that the writer does not understand he/she is moving beyond the writer's expertise, or that the writer is trying to obfuscate as their position is on shaky ground. In this case, the writer was either trying create a story that does not exist or has a bias and is looking for anything that would seem to support the position.

Anonymous said...

I always find it amuzing how people can use the same statistic to "prove" totally different points of views.

Erik said...

I teach statistics at a bachelor's degree school. Your view on "significance" is wrong because this concept only applies to samples and tests. The figure about driven miles is, I assume, not a test result from a sample, but an average calculated on all driven miles. If this calculation has been correct then the Americans have indeed, on the average, driven less than in previous years. A second relevant matter is fuel usage, because if the cars use more fuel than before the environmental effect is zero or less. But for the rest you are right: few things are less abused than statistics. Take the election polls. In Holland, my country, we have 12-15 parties who apply for a share in our 150 parliament seats. It is almost impossible to have a random sample from which numbers of seats can be forecast for the elections that come: each party has a confidence range from 50-80%, all parties together will result in lesser confidence intervals. Every time again these polls are held, every time again their failures are proven, but people simply want to have them. Mundus vult decipi, the world wants to be cheated. But if statistics are applied appropriately then they are extremely useful, e.g. in forecasts of customers, production, in physical, psychological, biological etc. research, in market research (provided the researcher is honest and doesn't promise more reliability than the (cheap) methods can guarantee), etc.