Statistics, Baseball, and Essentialism, or why A-Rod should win the AL MVP

9 October 2005, 0300 EDT

Readers of some of my other posts expressing a certain skepticism about the value of statistical analysis might be surprised to learn that when it comes to baseball, I’m a devoted “seamhead” — one of those technically inclined fans who thinks that quantitative analysis is the best way to figure out what works and what doesn’t in the game. In particular, I am of the opinion that one can only evaluate truly impressive performances through quantitative analysis; while I might disagree with Bill’s claim that Pedro Martinez is the best pitcher ever, I certainly can’t fault his logic…although I can disagree with his criteria for greatness. (I’d rate longevity more importantly than Bill does, but that’s a debate for another time.)

I’d argue that the numbers are pretty unambiguous about who should get this years’ American League MVP award; his name is Alex Rodriguez, and he plays third base for the Yankees. And the numbers — at least the numbers that really matter — say that David Ortiz, designated hitter for the Red Sox who is frequently mentioned as an MVP candidate, shouldn’t get the award. Yes, you might expect me to say this, given that I make no secret of the fact that I’m a Yankees fan who experiences great schadenfreude at the Red Sox’s perennial struggles. But in this case, I have defensible reasons.

But first, given the discussion about my post on essentialism, I think that I need to take a couple of paragraphs to explain why I put so much stock in baseball statistics.

Let’s review: I maintained, and continue to maintain, that statistical reasoning is essentialist in that it treats correlations as evidence of some kind of intrinsic connection between independent and dependent variables. Certainly, many if not most statisticians have causal stories about their correlations that suggest that something other than the variables themselves are responsible for the observed correlations, but that strikes me (and Andrew Abbott, whose analysis in his article on “general linear reality” ought to be required reading for anyone who wants to really engage in this debate) as somewhat beside the point. What matters is that the very techniques of statistical analysis presume, in a very fundamental manner, that general conceptual categories account for outcomes. That’s the way that the math works; it’s not optional.

So why do I have such a problem with analysis like that applied to the social world in general, but absolutely no problem with such essentialism applied to baseball? I won’t try to argue that baseball statistics aren’t essentialist; they are, since the whole point is to correlate people’s names with various outcomes over time. In fact, I’d say that the entirety of regular-season baseball can be thought of as a social machine designed more or less intentionally to simulate essences by putting the same people in the same situations over and over and seeing what emerges. Regular-season baseball is a giant laboratory, in which the same experiments are run repeatedly and the results tracked very precisely. And in such an environment, the best tool for measuring the results is quantitative statistical analysis.

Two quick clarifications. First, I say “simulate” rather than “reveal” essences because I am reserving judgment about whether people “really are” .300 hitters or 20-game winners outside of the context of baseball; I don’t even know how you would know that in the first place. Nor do I think that it matters whether people are good players “intrinsically” or whether the process of playing baseball and selecting people to move up from the minors to the majors produces those players; all that matters is that, within the environment of regular-season baseball, certain players perform in certain ways. Leave the fundamental ontology aside and focus on observed performance. Second, one important difference between the affected laboratory of regular-season baseball and the laboratories utilized in the physical sciences is that physical science laboratories are designed to reveal something about the world outside of themselves, while regular-season baseball is designed to reveal nothing about any world other than its own. Someone hitting .350 one year tells us nothing whatsoever about how that person would swing a golf club, or shoot a pistol, or any other kind of activity — which is in part why we can look at baseball statistics without making any fundamental ontological claims about precisely what is going on.

Enough philosophy — the point is that statistical analysis is appropriate for regular-season baseball just because regular-season baseball constitutes precisely the kind of situation in which you would expect such analysis to be revealing. So what do the statistics tell us about who ought to win the Most Valuable Player award in the American League this year? Let me just focus on the two widely-hailed front runners, Ortiz and Rodriguez. Their season numbers:

Rodriguez     Ortiz
Games played:  162 159
At-bats:  605 601
Hits:  194 180
Runs:  124 119
Home Runs:  48 47
Runs Batted In:  130 148
On-base percentage:  .421 .397
Slugging percentage:  .610 .604

We can see that their numbers are quite similar in almost every category, but Ortiz has the edge in Runs Batted In while Rodriguez has the edge in on-base percentage. And whenever I hear Ortiz discussed as an MVP candidate, the person bringing up his name invariably mentions RBI. This is silly. RBI, as every thinking fan should know, is what happens when a good hitter (which Ortiz certainly is) comes up to bat with runners on base. This means that RBI statistics are not just a measure of how well the hitter has done, but also in part a measure of how well the people ahead of him have done. Now, if hitters generally hit differently with people on base, then I could understand using RBI as a measurement of how productive someone was at the plate, but numerous studies using a century of data have pretty conclusively demonstrated that nobody hits significantly better or worse with runners on base than they do in other situations. So RBI is a bad category to base a player’s value on.

What about on-base percentage? Well, we know that the single most important thing that a player can do to help his team win games is to get on base. The numbers are pretty conclusive, especially if we think about things like the “expected run value” of various configurations of runners on base; getting on base gives your team more chances to score, and in the end that’s half of how you win games. (The other half is pitching . . . actually, pitching is probably a little more than half, but that’s a different discussion.) Given this, A-Rod’s .421 on-base percentage demonstrates that he was more valuable to his team as a hitter than Ortiz was with his .397.

The other thing that I always hear about Ortiz is that he’s such a “clutch” player, that he always “comes through in the clutch,” always “gets the big hits,” etc. If RBI is silly, the notion of a “clutch player” is downright ridiculous, since there is no such thing as a clutch player! As Bill James once put it, there are certainly clutch plays, but no players are inherently better in demanding, high-pressure, game-on-the-line situations than they are in others. Good hitters are good hitters, period. And there is so much statistical information out there dissolving the idea that there are clutch players (and the related idea that a player’s numbers at the plate with “runners in scoring position” means anything at all . . . the problem is that almost no one comes up to the plate often enough to see whether or not they do hit significantly differently in those situations, and when players do come up that often, their numbers are not significantly different than those that they put up in other situations).

So why does this myth persist — and why might it end up sufficing to get David Ortiz an award that he doesn’t deserve this year, great hitter though he is? Since baseball is made up of repeated actions and is structured like a laboratory, normal occurrences simply fade into the background. What stands out are the rare, significant, special plays — those out-of-the-ordinary events that are memorable precisely because they stand out from the rest. So we remember the dramatic game-winning home run, the epic at-bat, the perfect game, and forget the rest. Now, other things being equal, a good hitter is more likely to have those kind of dramatic moments over time, since he is hitting well. So it’s not surprising that guys like David Ortiz provide memorable moments over the course of a season — and if they’re lucky, then there will be several of them in succession. And it is luck, not some intrinsic “clutchness,” since their ain’t no such thing. [If there were, it would show up in the numbers, and it doesn’t, so there isn’t.]

For these reasons I think that the MVP award should go to Alex Rodriguez, in recognition of his performance over the course of the regular season. The nice thing about the way that baseball is organized is that it makes it possible for us to focus on performance and to reward people simply for what they have done, without inquiring too deeply into why they have done it and getting ourselves tied up in fundamental ontological knots. Indeed, this is also how baseball statistics preserve agency: the fact that we don’t know why someone did what they did permits us to assign responsibility for the performance to them, instead of assigning it to some contextual or environmental factor.

But the Astros have just beaten the Braves, just as I predicted that they would, and so it’s time for me to go to bed and stop rambling.
Filed as: