Thursday, March 14, 2013

How to use exit polls to mislead: Part 2

I embrace statistics. But I also recognize when statistics can be misleading. Certain people like to say that you can make statistics say whatever you want, if you know how to manipulate them. That is true to an extent. But you can’t make something categorically untrue all of a sudden true by manipulating statistics. This is obvious to almost everyone. No matter how you manipulate the statistics, you can’t prove that Eddie Gaedel is a greater power hitter than Babe Ruth. But you can manipulate statistics to prove half-true or even misleading truths.
Take the statement that Mark McGwire is a better power hitter than Babe Ruth. How would one go about proving it? You would construct an argument that sets out to prove it. This could take any number of forms but we’ll use a fairly easy one to show how the manipulation of stats works.
1. The greatest power hitter is the hitter who uses the least amount of at-bats to hit the highest amount of home runs (or more simply, the greatest power hitter has the least amount of at-bats per home run.)
2. Mark McGwire hit a home run every 10.61 at-bats
3. Babe Ruth hit a home run every 11.76 at-bats.
4. Therefore, Mark McGwire is the better power hitter.
According to this argument, Mark McGwire is the better power hitter. Is there a problem with this argument? Obviously, it is a valid argument. This means that the truth of the premises entail the truth of the conclusion. Is it a sound argument? Maybe. It depends on how you feel about the first premise. Maybe at-bats per home run are an inadequate way to measure the ability of a power hitter. Consider the following:
1. The greatest power hitter is the hitter who uses the least amount of at-bats to hit the highest amount of home runs (or more simply, the greatest power hitter has the least amount of at-bats per home run.)
2. Russell Branyan hit a home run every 15.12 at-bats.
3. Hank Aaron hit a home run every 16.38 at-bats.
4. Therefore, Russell Branyan is a greater power hitter than Hank Aaron.
Of course, now the argument seems ridiculous on its face. This might violate common sense principles that we have but we could certainly use statistics in such a way that prove that Branyan is a greater power hitter than Hank Aaron. So, now we want to disprove that Branyan is a greater power hitter than Aaron. An easy way to do this is like this.
1. The greatest power hitter is the hitter who has the most amount of home runs in his career.
2. Hank Aaron hit 755 home runs in his career.
3. Russell Branyan hit 194 home runs in his career.
4. Therefore, Hank Aaron is the greater power hitter.
All done. We’ve now proved that Aaron is the greater power hitter than Branyan. You should, by now, realize where I’m going with this. Does the argument hold up in all cases?
1. The greatest power hitter is the hitter who has the most amount of home runs in his career.
2. Chili Davis hit 350 home runs in his career.
3. Hank Greenberg hit 331 home runs in his career.
4. Therefore, Chili Davis is a greater power hitter than Greenberg.
This doesn’t seem as ridiculous as Hank Aaron being a lesser power hitter than Russell Branyan. So, perhaps we’re on the right track. It’s possible that some people even consider Davis a greater power hitter than Greenberg. But just for kicks, since I believe that Greenberg is a greater power hitter than Davis, how would one go about proving that Greenberg is a greater power hitter than Davis? One way to go about it would be to have the first premise as the greatest power hitter is the hitter who hits the most home runs in a season. Of course, eventually the argument would also say that Luis Gonzalez is a greater power hitter than Hank Aaron. Or I’ll skip a bunch of hypothetical arguments. And go with this one.
1. The greatest power hitter is the hitter who has the highest amount of home runs hit per 162 games played.
2. Mark McGwire has the highest mark for that at 50 home runs per 162 games played.
3. Therefore, Mark McGwire is the greatest power hitter.
Hank Greenberg beats Chili Davis by this measure but that should be obvious. We’re back where we started. Maybe Mark McGwire is the greatest power hitter. Of course, this ignores a whole lot of context. McGwire was playing in an era where home runs were much more prevalent. He may have had the help of performance enhancing drugs. These factors are not shown in the statistics that we used. It’s fairly difficult to capture context with statistics.
For the most part, by creating an argument focusing on showing one thing, we're ignoring other potential problems with the argument.  We're working backwards from a conclusion.  Instead of letting the facts fall where they may to form a conclusion.  This is obviously a problem but how do we solve it.

No comments:

Post a Comment