PoliSci-unrelated post of the day: Visualizing Major League Baseball, 2001-2010

21 December 2010, 1620 EST

This post originally appeared at Beyond the Box Score.  If you are a baseball analysis fan and don’t already read BTBS I highly recommend it.

2010 marks the end of the “ought” decade for Major League Baseball.  I thought I would take the opportunity to analyze the last 10 years by visualizing team data.  I used Tableau Public to create the visualization and pulled team data from ESPN.com (on-field statistics) and USA Today (team payroll).

The data is visualized through three dashboards.  The first visualizes the relationship between run differential (RunDiff) and OPS differential (OPSDiff) as well as the cost per win for teams.  The second visualization looks at expected wins and actual wins through a scatter plot.  The size of each team’s bubble represents the absolute difference between their actual and expected wins.  Teams lying above the trend line were less lucky than their counterparts below the trend line.The final tab in the visualization presents relevant data in table form and can be sorted and filtered along a number of dimensions.

The first visualization lists all 30 teams and provides their RunDiff, OPSDiff, wins, and cost per win for 2001-2010.  The default view lists the averages per team over the past 10 years, but you can select a single year or range of years to examine averages over that time frame.  The visualization also allows users to filter by whether teams made the playoffs, were division winners or wild card qualifiers, won a championship, or were in the AL or NL.  The height of the bars corresponds to a team’s wins (or average wins a range of years).  The color of the bars corresponds to a team’s cost per win–the darker green the bar the more costly a win was for a team.  Total wins (or average for a range of years) is listed at the end of each bar.  In order to create the bar graph I normalized the run and OPS differentials data (added the absolute value of each score + 20) to make sure there were no negative values.  For the decade, run differential explained about 88% of the variation in wins and OPS differential explained about 89% of the variation in run differential.

The visualization illustrates the tight correlation between RunDiff and OPSDiff, as the respective bars for each team are generally equidistant from the center line creating an inverted V shape when sorted by RunDiff.  In terms of average wins over the decade, there are few surprises as the Yankees, Red Sox, Cardinals, Angels, and Braves round out the top 5.  However, St. Louis did a much better job at winning efficiently, as they paid less per win than the other winningest teams (<$1M per win).

5283757704_65097bb2f0_z_medium
(click for larger image)

The viz also illustrates the success of small market teams such as Oakland and Minnesota who both averaged roughly 88 wins while spending the 3rd and 4th least respectively per win.  If you filter the visualization for teams that averaged over 85 wins during the decade, it really drives home how impressive those two teams’ front offices have been at assembling winning ball clubs with lower payrolls.  No other team that averaged >85 wins paid less than $975K per win.  Oakland looks even more impressive when you isolate the data for years that teams qualified for the playoffs.  Oakland averaged 98.5 wins during seasons they made it to playoffs, and did so spending only $478K per win.


5283158827_5cec5c321a_z_medium
(click for larger image)
What about the big spenders?  The five biggest spenders included the Yankees, Red Sox, Mets, Dodgers, and Cubs.  The Yankees spent an astounding $1.8M per win during the decade, but they also averaged the most wins with 97.  Some will say this provides evidence that the Yankees–and other big market teams–simply buy wins and championships.  However, only 17% of the variation in wins was explained by payroll during the decade.  Moreover, while the Yankees occupied 6 of the top 10 spots in terms of cost per win they were the only team to earn a positive run differential.  The Cubs, Mets, Mariners and Tigers all finished under .500 and missed the playoffs while those Yankee teams qualified for the playoffs 5 out of 6 years and won one World Series.  Yes, the Yankees spend significantly more per win, but they spend more wisely than many other deep pocket teams.
Teams that made the playoffs averaged a little over $1M per win in those years they qualified, with Wild Card teams ($1.030M) spending a tad bit more than Division winners ($1.006M)–about $14K per win on average.  World Series winners spent $1.08M per win in their winning years compared to $1.002M for other playoff teams.  Teams that failed to make the playoffs averaged $923K per win.
The best team of the decade in terms of run differential?  The 2001 Seattle Mariners, who amassed an incredible +300 RunDiff.  Even with that total they were only expected to win 111 games–they would go on to win 116.  The Mariners had only the 11th highest payroll that year and so paid a measly $644K per win.  The absolute worst team of the decade?  The 2003 Detroit Tigers, who earned a RunDiff of -337 and actually won less games than expected (43 vs. 47).  Given their ineptitude on the field, the Tigers paid $1.14M per win even though their total payroll for the year was only $49M.
Luckiest team?  The 2005 Diamondbacks who won 77 games despite a RunDiff of -160 (only 64 expected wins).  Hardest luck team?  The 2006 Indians, who only won 78 games with a +88 RunDiff that should have translated into 90 wins.
5283158811_90d96e0457_z_medium
(click for larger image)

There are tons of ways to manipulate the visualizations and cut the data.  Hopefully viewing the data in this way is helpful and illuminates some things we didn’t know and drives home other things we had a hunch about. This is my first attempt to visualize this data, so please feel free to send along any and all comments so I can improve it.

Author’s Note: Due to a very helpful comment by Joshua Maciel, I have updated the visualization.  Here is a link to the original version for those that are interested.