Tag: baseball (page 2 of 2)

One down, three to go

If I am the Cardinals I stop pitching to Carlos Beltran in the playoffs:

Over the last 8 postseason games in which he has faced the Cardinals, Beltran is batting .357 with 5 HRs, 7 RBI, and 12 RS in 8 games–that’s just sick.



So how good is Pedro Martinez?

[Cross posted at Discord and Elaboration]

As a New York Mets fan I can’t help but marvel at the acquisition and the performance of Pedro Martinez this season. True, many doubted whether giving Pedro a 4-year deal given his age, loss of velocity, and recent injuries was a wise move by the Metsies, but so far the investment is paying off. It got me thinking–just exactly how good is Pedro? It seems to me that his antics over the years (the midgtet mascot, the Don Zimmer throw down, the threat to peg the Bambino in his *ss, etc.) has to some extent lessened the apprecatiation for just how dominant a pitcher Pedro has been in his era and historically. So following Patrick’s lead on the trustworthiness of statistics in baseball (a position I firmly agree with and endorse), I ran the numbers. What they seem to say is that even I underestimated just how good Pedro Martinez is. Arguably, he is the best ever…I tried to take the most relevant stats that are representative of individual performance for a pitcher (e.g. ERA, WHIP (walks+hits/innings pitched), strike outs/9 innings) as well as average number of wins per season and winning percentage (so as to normalize the statistics). As far as active pitchers go (those that pitched in 2004 and have over 1000 innings pitched), Pedro is number 1 in ERA (2.71), number 1 in winning percentage (.705), and tied for second in K’s/9 innings (10.4). As if that wasn’t enough, I ran some statistics which paired Pedro up against modern pitchers as well as all pitchers going back to 1871 (again, with at least 1000 IP’s). His dominance gets even scarier.

First lets look at ERA. Taking the entire league since 1871 reveals just how much of an advantage pitchers had before what I call the “Ruth-effect”. The first player to appear in the rankings who pitched after the 1940s was Hoty Wilhelm (2.52) at number 44. The next player–Pedro at 77 (2.71). Okay, so lets take a look at only modern pitchers (post-1940). I went ahead and removed prominent relievers from the mix (since I believe their statistics tell us something different about there performance as opposed to starters, but that is for another time…) Here are the rankings:

Career ERA leaders (post-1940, min. 1000+ innings

nameLast nameFirst
Martinez Pedro 2.71
Ford Whitey 2.75
Koufax Sandy 2.76
Chandler Spud 2.84
Palmer Jim 2.86
Messersmith Andy 2.86
Seaver Tom 2.86
Marichal Juan 2.89
Gibson Bob 2.91
Brecheen Harry 2.92
Chance Dean 2.92
Hughson Tex 2.94
Drysdale Don 2.95
Maddux Greg 2.95
Cooper Mort 2.97
Stottlemyre Mel 2.97
Hubbell Carl 2.98
Lanier Max 3.01
Dean Dizzy 3.02
Newhouser Hal 3.06
Grove Lefty 3.06
Bonham Tiny 3.06
Johnson Randy 3.07
Veale Bob 3.07
Nolan Gary 3.08
Spahn Warren 3.09
Perry Gaylord 3.11
Horlen Joe 3.11
Gullett Don 3.11
Tudor John 3.12

Pedro comes out number one overall. The next (and only) active pitcher to make the top 30 is Gregg Maddux at 14. Roger Clemens is 39 while Curt Schilling is in the late 70s.

Ok, so what about WHIP?

Against the greatest pitchers of the modern era, Pedro again finishes number 1.

Career WHIP leaders (post-1940, min. 1000+ innings)

Rank nameLast nameFirst WHIP
1 Martinez Pedro 1.028
2 Marichal Juan 1.101
3 Koufax Sandy 1.106
4 Schilling Curt 1.113
5 Seaver Tom 1.121
6 Maddux Greg 1.127
7 Hunter Catfish 1.134
8 Saberhagen Bret 1.141
9 Jenkins Fergie 1.142
10 Sutton Don 1.142
11 Messersmith Andy 1.143
12 Fernandez Sid 1.144
13 Nolan Gary 1.145
14 Drysdale Don 1.148
15 Bonham Tiny 1.153
16 Johnson Randy 1.162
17 McLain Denny 1.163
18 Hubbell Carl 1.166
19 Mussina Mike 1.169
20 Smoltz John 1.169
21 Roberts Robin 1.170
22 Bunning Jim 1.179
23 Palmer Jim 1.180
24 Clemens Roger 1.181
25 Perry Gaylord 1.181
26 Guidry Ron 1.184
27 Candelaria John 1.184
28 Soto Mario 1.186
29 Terry Ralph 1.186
30 Gibson Bob 1.188

Active pitchers fare far better in the WHIP category, but Pedro once again shows that he is among the greatest all time. Lets extend the analysis to all pitchers from 1871-2004. This puts Pedro at an extreme disadvantage give that pitchers before the 1940s didn’t have to deal with the DH, juiced balls (and players), smaller ballbarks and smaller strike zones. I won’t bog you down with another long chart, so instead I will simply list the top 10:

Rank nameLast nameFirst WHIP
1 Joss Addie 0.968
2 Walsh Ed 1.000
3 Martinez Pedro 1.028
4 Ward John 1.044
5 Mathewson Christy 1.059
6 Johnson Walter 1.061
7 Brown Mordecai 1.066
8 Sweeney Charlie 1.067
9 Russell Reb 1.080
10 Wood Joe 1.085

Pedro is number 3, led only by Addie Joss and Ed Walsh (if you know these names you are a bettern person than me). Aside from these two ‘memorable’ players, Pedro is better than two of the most celebrated pitchers in history–Christy Mattewson and Walter Johnson. Bob Gibson, who is widely touted as one of the greatest pitchers in history–one that played in a pitcher’s era–is 108. Yes, Pedro is that good.

Other Measures
Pedro is also quite studly when it comes to K’s per 9 innings and win percentage. Against the all-time greats, Pedro ranks 3rd since 1871 (.705) and 1st among modern-day pitchers for winning percentage and he ranks 2nd all time behind Randy Johnson for K’s/9 at 10.4.

Finally, I decided to create my own metric for comparing pitchers across time. It is an overall measure of a pitcher’s performance across eras. I took each pitcher and calculated their ranking in terms of Wins per year, SO/9 innings, WHIP, and ERA versus all other pitchers since 1871. The reasoning behind this was to normalize the statistics since different eras can skew pitchers’ stats in different ways–average help alleviate some of this. I took each player’s ranking for each category and added them together, so if someone had seperate rankings of 3, 2, 7, and 14 their total score would be 26. The lower the total score, the better the pitcher. So here are the top 30 pitchers all time according to my metric:

Rank nameLast nameFirst ERA SO WHIP Win % Total
1 Martinez Pedro 2.71 2653 1.028 0.705 212.0
2 Waddell Rube 2.16 2316 1.102 0.574 225.8
3 Koufax Sandy 2.76 2396 1.106 0.655 255.8
4 Seaver Tom 2.86 3640 1.121 0.603 311.6
5 Johnson Randy 3.07 4161 1.162 0.658 368.5
6 Johnson Walter 2.17 3509 1.061 0.599 379.9
7 Gibson Bob 2.91 3117 1.188 0.591 396.8
8 Maddux Greg 2.95 2916 1.127 0.637 404.1
9 Drysdale Don 2.95 2486 1.148 0.557 417.9
10 Clemens Roger 3.18 4317 1.181 0.667 423.6
11 Marichal Juan 2.89 2303 1.101 0.631 442.2
12 Walsh Ed 1.82 1736 1.000 0.607 467.9
13 Mathewson Christy 2.13 2502 1.059 0.665 519.9
14 Messersmith Andy 2.86 1625 1.143 0.568 578.8
15 Jenkins Fergie 3.34 3192 1.142 0.557 580.9
16 Overall Orval 2.23 935 1.161 0.603 587.0
17 Bunning Jim 3.27 2855 1.179 0.549 602.2
18 Plank Eddie 2.35 2246 1.119 0.627 613.2
19 Tesreau Jeff 2.43 880 1.145 0.615 614.4
20 Perry Gaylord 3.11 3534 1.181 0.542 619.3
21 Keefe Tim 2.62 2562 1.121 0.603 620.4
22 Sutton Don 3.26 3574 1.142 0.559 622.1
23 Schilling Curt 3.32 2745 1.113 0.599 639.8
24 Chance Dean 2.92 1534 1.212 0.527 648.6
25 Hudson Tim 3.30 899 1.222 0.702 663.3
26 Guidry Ron 3.29 1778 1.184 0.651 676.1
27 Corcoran Larry 2.36 1103 1.105 0.665 680.1
28 Ferguson Charlie 2.67 728 1.117 0.607 681.8
29 Ford Whitey 2.75 1956 1.215 0.690 685.1
30 Ramsey Toad 3.29 1515 1.243 0.479 689.0

According to this, Pedro is indeed the number one pitcher when it comes to personal performance. The metric seems reliable since the names that accompany Pedro are among the most heralded to ever play.

Anyway, I am sure there are a number of problems with my measures but in either case I hope this illustrated just how dominant Pedro is in relation to the greatest pitchers of all time. We are certainly lucky to have him in NY even if it is the tail-end of his career. By the way, so far this year Pedro is 7-1 (and he shoud be 9-1 since the bullpen blew two leads for him late) with a 2.45 ERA, 104 K’s, only 13 walks and–wait for it–a WHIP of 0.67 which is, to say the least, ridiculous.

Filed as:, , ,


(Baseball) Statistics Never Lie

[cross-posted at ProfPTJ’s Course Diaries]

One of the things that I find the most fascinating about the game of baseball is the fact that statistical data about the performance of players and teams is actually meaningful. This is so largely because the kinds of things being measured — whether a team wins or loses, how many balls and strikes a pitcher throws, what percentage of the time a player gets on base as opposed to making an out — involve a repetition of the same basic actions a sufficient number of times that random fluctuations cancel out. Players do the same basic things enough times that over the course of a season their ability to do those things (like get the bat on the ball, throw a strike, and so forth) will be reflected in their numbers.

For example, every “plate appearance” that a batter has over the course of a season is roughly similar to every other plate appearance in its basic contours, and over the course of a 162-game season the average player can expect to come to the plate about 500 times — a sufficiently “large n” that saying that a batter has an on-base percentage of .446 and a slugging percentage of .536 is a meaningful statement.1 Contrast this to “football statistics,” which are based on a regular season of only 16 games; players rarely get sufficient chances to do things like catch passes and rush for yardage to make meaningful quantitative comparisons possible. This doesn’t stop people from making those comparisons, and playing “fantasy football” based on them, but I’ll keep my statistics operating in realms where they make some sense, thank you very much.

The fact that baseball statistics are meaningful allows observers of the game to conduct very precise analyses of how well their teams and players are doing. Just for kicks, this morning I plugged some numbers into a spreadsheet to make a rudimentary calculation not about which teams were doing the best in terms of wins and losses — that information is readily available in any major newspaper, and all over the web (for instance, here) — but about which teams were performing most efficiently. I took information about the 2005 payrolls of all 30 major league teams from this site, and had Excel calculate approximately how much money each team was paying for each of the wins it had thus far achieved this season.2

The results are interesting, although I won’t bore you with all of the details. The important results are these:

  • the Yankees have the most expensive wins, at $2,571,689.10 per win; the Devil Rays have the cheapest, at $498,447.13. This is not a major surprise, since the Yankees’ overall payroll is about seven times as large as the Devil Rays’ payroll.
  • what is surprising is that the Yankees are paying almost twice as much for a win as the next team in the list, the Boston Red Sox. And the Red Sox are doing better than the Yankees in the overall win-loss standings.
  • of the teams whose winning percentage is .500 or greater, the three teams paying the least for their wins thus far this season are the Toronto Blue Jays ($535,243.19), the Washington Nationals ($568,748.94), and the Minnesota Twins ($574,432.48). The payrolls for all three teams are in the bottom third of all major league teams.

What this tells me is not just the the Yankees are playing crappy baseball this year — that much I knew already — but precisely how crappy their play has been. It also underscores precisely how impressively the Washington Nationals have been playing; they’re making a little bit of salary go a very long way.

The analysis also demonstrates, pretty concretely, that simply spending money on a baseball team doesn’t guarantee you success. You also have to use that salary efficiently, and get sufficient bang for your buck. The Yankees are spending about $208 million and thus far have a 27-27 win-loss record; the Nationals’ total payroll is about $48.5 million, and their record is 29-26. Put another way, the Nationals are spending 1.171% of their salary for each win, while the Yankees are spending 1.235%; the difference may not look like much, but over a 162-game season, minor variations between teams and players translate into major differences.

The fascinating thing is that in baseball we can determine precisely how much difference these things make. I’d be very resistant to running numbers like this in most other situations, but in baseball, bring on the quantitative analysis!

1 On-base percentage measures how often a batter reaches base successfully, whether through getting a hit or drawing a walk or being hit by a pitch; slugging percentage measures the total number of bases that a player reaches in all of his at bats; details on basic baseball stats can be found here. The specific numbers that I used for this example are Nick Johnson’s stats for the present season. The “500 at-bats” figure is approximately the minimum number of at-bats required to qualify for a batting title under the present 162-game regular-season schedule. 2 We’re approximately 1/3 of the way through the season at the moment, so if we divide each team’s salary by 3 and then divide that number by the number of wins, we get the effective “price per win” that each team is paying — note that this does not include the salaries for managers, coaches, etc., but is limited to the combined salaries of all the players on the team’s payroll.


“Momentum” in Baseball

As a baseball fan, I watch a fair number of games during the regular season, most of them on television (ballpark prices, even for the cheap seats, prohibit more than about one family outing a month to see a game live). A lot of the commentary by the sportscasters is silly, although the anecdotes told by former players and the locker room gossip that reporters have managed to pick up are sometimes interesting. But the analysis of what is going on in the game or in the season as a whole is, by and large, silly, since it deploys questionable analytical strategies and dubious evidence, such as the reverence that many commentators have for strategies like the sacrifice bunt — strategies that are only rarely reasonable moves to make during a game. So what we often get from sportscasters is nothing more than speculation, sometimes speculation informed by experience, but more often speculation informed by some unsupported general sense of how the game should be played.

The worst abuse of technical language that I have encountered during these sportscasts, though, involves the word “momentum.” Time and time again I hear assorted commentators discussing the “momentum” that a team has, as though “momentum” were an attribute of a baseball team and could be reliably utilized as a basis upon which to make predictions about the future performance of that team. “Momentum,” in these folk analyses, is treated as a mechanism that explains an object’s trajectory through space, with the object being the team and the space being the regular baseball season. And partisan commentators regularly admonish their team to “keep the momentum going” or to “get some momentum up” by winning several games in a row, a locution that presumes that a team could somehow alter its momentum through more or less deliberate action and thus produce more wins.

This is a profoundly misleading notion, and the precise ways in which it is misleading might serve as an object lesson for other attempts at predicting the outcomes of social processes.

Momentum, strictly defined, is a measurement of an object’s motion that defines, in effect, how hard it is to stop that object. Momentum is calculated by multiplying an object’s mass by its velocity, where velocity is a scalar quantity that describes both the speed and the direction of motion.1 Two things are highly relevant here: first, momentum is a description of motion, and only “explains” it in the technical sense of providing a way to calculate an object’s position at time t+1 given its position at time t and a knowledge of its momentum at time t; and second, like any analytical operation, a prediction based on such knowledge has an enormous ceteris paribus clause: the model deliberately excludes a lot of context, and presumes that it isn’t relevant to the outcome in question.

What do these qualifications mean in practice? The first qualification makes something of a mockery of the idea that a team can improve its record by changing its “momentum,” since momentum isn’t so much a causal mechanism producing outcomes as it is an element (and only one element) of an analytical system explaining outcomes. In order to produce outcomes, we’d need to know what mechanism or mechanisms accounts for whatever “momentum” a team has at a given point in time, so that we could modify that mechanism directly. “Momentum” is, at best, an intervening variable.

And that’s only at best, because the second qualification — the ceteris paribus clause associated with all attempts to predict based on a model — renders the notion of momentum pretty much unusable for social phenomena. Think for a moment about what is presumed when someone says that a team has “momentum”: that baseball teams are akin to solid physical objects (like the baseballs that they toss around the field), that their direction of flight through the air is as definable as that of any physical object, and that the parameters of the system within which the objects are flying are more or less parametrically fixed over the whole course of the period in question.

Take these each in turn:

Are baseball teams like solid physical objects? Well, given the frequency of trades, roster moves, line-up shuffles, and the like, I’d have to say “no.” Calculating the mass of a baseball team — an essential part of the momentum formula — is thus pretty much impossible, and would be roughly akin to trying to deal with the flight of a baseball whose mass shifted while in flight. And even if the overall mass (however calculated) remained constant, the continual adjustments to the roster and the lineup would make it impossible to precisely describe the velocity of the team, since the simplifying assumption (that mass is concentrated in the center of an object) ordinarily used to approximate such things would no longer be empirically valid.2

Is direction of flight precisely calculable? Well, with a baseball moving through the air, in principle, yes, because we can set up radar guns and laser trackers and the like. And once we have a general equation describing the path of a ball in flight, we can use that as a basis on which to predict what is going to happen the next time a ball is thrown, assuming for the moment that nothing fundamental will have shifted by then (such as a hurricane suddenly coming in, or gravity in the stadium altering, etc.). But with a baseball team, we have an epistemological endogenity problem, in that players and managers (and owners … especially intrusive owners like George Steinbrenner, czar of the Yankees) read the analyses offered and can factor those analyses into their ways of proceeding. This means, in effect, that the direction of flight of a social object like a baseball team — or the global political economy — can’t actually be precisely measured. We can describe the instantaneous rate of change of the object’s position3 at some past point in time, but describing it in the present always runs the risk that this very description will alter the object’s velocity, as the team takes concrete steps to change the direction of its flight.

Antonio Gramsci, of whom I am not the biggest fan, nonetheless put his finger on it when he pointed out that predictions in the social world were impossible because of the ways in which such predictions interacted with the people whose trajectory was being predicted. Likewise, Isaac Asimov nailed this problem in his Foundation series of novels when he made it a condition for the success of succesful predictions of the human future that the subjects of the calculations be kept ignorant of the details of the calculations. Whatever we are doing when we make predictions about the social future, it isn’t the same thing as physical scientists do when extrapolating the trajectory of a physical object.

Do the parameters of the system remain intact over time? Well, in baseball, I think the answer has to be a qualified “yes” — yes because there are whole organizations set up to prevent the parameters from shifting, and qualified because some of the fundamentals (like the height of the pitcher’s mound and the precise boundaries of the strike zone) have in fact changed over time. So if one could overcome the other problems, which I don’t think that you can, there might be a prima facie case for extrapolating from a team’s past performance to its future performance, and giving a notion like “momentum” some (pardon the pun) real analytical weight.4

Hence: predict baseball outcomes with extreme caution, and avoid notions like “momentum.” Specifying actual mechanisms that produce wins is a much more analytically defensible way to go.

That having been said, if the Yankees keep playing the way that they have been the past few weeks, they’ll win the AL East again, and I doubt that the wildcard will come from the East this year; the AL Central has so many bottom-feeding teams that I suspect that both Minnesota and Chicago will amass records sufficient to get them to October.

1Even though most baseball commentators only use “momentum” to describe a team that has strung together a series of victories, there is no logical reason why one couldn’t similarly say that a team on a losing streak had “momentum” in the opposite direction. That is, if you wanted to use “momentum” to describe such things in the first place.
2Imagine a baseball the interior of which was actually composed of a series of small metal weights, the position of which was affected by a magnetic field generated by a device at the center of the baseball. Imagine further that this magnetic field could shift. Even though the overall mass of the baseball would remain constant, its precise character could be shifted in small ways by redistributing the weights inside of the ball; moving the baseball’s center of gravity would affect the character of its velocity, and hence the character of its momentum. Now, apply this to a baseball team composed of many more moving parts.
3Technically, this is what velocity is: the first derivative of position.
4The closest thing I know of that does this is the so-called “pythagorean average” formula invented by sabremetrics guru Bill James. But this formula differs from a “momentum” formula in that it incorporates actual mechanisms that generate wins and uses those as a basis to forecast a team’s performance.

Newer posts »

© 2021 Duck of Minerva

Theme by Anders NorenUp ↑