The following is a guest post by Cullen Hendrix of the University of Denver.  

If you’ve read or seen Moneyball, the following anecdote will be familiar to you: Baseball is a complex sport requiring a diverse, often hard-to-quantify[1] skillset. Before the 2000s, baseball talent scouts relied heavily on a variety of heuristics marked by varying degrees of sanity: whether the player had a toned physique, whether the player had an attractive girlfriend, and whether or not the player seemed arrogant (this was seen as a good thing). Billy Beane and the Oakland Athletics changed things with a radical concept: instead of relying completely on hoary seers and their tea-leaf reading, you might look at the data on their actual productivity and form assessments that way. This thinking was revolutionary little more than a decade ago; now it’s the way every baseball team does business.

Roughly around the same time, physicist Jorge Hirsch was starting a revolution of his own. Hirsch was ruminating on a simple question: what constitutes a productive scholar? Since the implicit answer to this question informs all our hiring, promotion and firing decisions, the answer is pretty important. In 2005, Hirsch published “An Index to Quantify an Individual’s Scientific Research Output”, which introduced the world to the h-index. Like most revolutionary ideas, its brilliance lay in its simplicity. Here’s the abstract:

I propose the index h, defined as the number of papers with citation number ≥h, as a useful index to characterize the scientific output of a researcher.

Thus, a metrics revolution was born. Hirsch had distilled information on citations and numbers of papers published into a simple metric that could be used to compare researchers and forecast their scholarly trajectory. That metric is at the heart of Google Scholar’s attempts to rank journals and forms the core of its scholar profiles. With Google’s constant indexing and unrivaled breadth, it is fast become the industry standard for citation metrics. Google’s scholar profiles track three basic statistics: total citations, the h-index, and the i10 index, which is simply the number of articles/books/etc. a researcher has published that have at least 10 citations.

So, what do these metrics say about productivity in the field of international relations, and why should we care?

Citation metrics are worth investigating for at least two reasons. First, metrics tell us something about how the work is being used. Stellar book reviews, gushing tenure letters, and article awards may tell us how the work is perceived, but those perceptions can be highly idiosyncratic. And, if we’re being honest, they can be driven by a host of factors (how well you are liked personally, the kind of shadow your advisor casts, whether the letter writer had just survived a shark attack, membership in the Skull and Bones Society, etc.) that have little if anything to do with the quality and/or the usefulness of the work in question. Yes, metrics do not completely get around these biases – see Maliniak et al. on the gender citation gap in IR – but that bias is much easier to account for than the former. Show me a book that was reviewed harshly but eventually cited a 1,000 times and I’ll show you a game changer. Show me a book that was reviewed glowingly and that has had virtually no quantifiable impact and I’ll show you a dud.

Second, this information may be useful to people submitting files for tenure and promotion to full. Before I started putting together my file, I realized I was completely unaware of what constituted a good citation record for someone who had been out for seven years. I’d heard various places that an h-index equal to or greater than years since PhD was a good rule of thumb, but that standard seems to have been designed with physicists in mind, and physicists publish faster than most people type. If you hear grumblings that you should have an h-index of such-and-such come tenure time, it would be good to know whether that bar is low or high, given prevailing citation patterns in the discipline.

With the help of RAs, I compiled data on the 1,000 most highly cited IR scholars according to Google Scholar.[2] Then, the RAs collected supplemental information on the year during which their PhD had been granted (PhD Year). The sample is one of convenience, based on those individuals who listed “International Relations” as one of the tags in their profile and for which the year their PhD was granted could be ascertained. For this reason, many highly (and lowly) cited individuals did not appear on the list.[3] However, the list includes all sorts: realists, liberals, constructivists, feminists, formal theorists, etc., and at all manner of institutions, though the bias is toward research universities. The list appears to be dominated by people at universities in the USA, UK, Canada and Australia.

Descriptive statistics for the group are as follows:

Variable Obs Mean Std. Dev. Min Max
GS citations 713 915.2 2804.9 0 40978
ln GS citations 713 4.8 2.2 0 10.6
h-index 713 8.5 8.9 0 73
i10 index 713 10.6 18.3 0 188
ln i10 index 713 1.6 1.3 0 5.2
Most Cited 713 184.9 567.7 0 9429
Ln Most Cited 713 3.4 2.1 0 9.2
Most Cited Solo 713 121.3 361.9 0 4620
Ln Most Cited Solo 713 3.2 1.9 0 8.4
PhD Year 713 2003.5 9.4 1961 2015

I plan to crunch the numbers a variety of different ways. For the moment, a cursory look at the data yields some potentially interesting insights:

  • Most scholars are not cited all that frequently. It’s time to take a deep breath when worrying about your citation count. Yes, the Joe Nyes and Kathryn Sikkinks of the world can give us all a little count envy, but the median total citation count for all 713 scholars in the sample was 119. That includes at least one person who got their PhD while John F. Kennedy was still president. If we just look at people who got their PhD since 2000, the median is 57. That the mean is so much higher than the median tells us what many of us suspect is true: it’s a pretty unequal world. The top 10% of cite-getters in the sample account for ~75% of all the citations.
  • The “h-index ≥ years since PhD” rule of thumb for scholarly productivity is probably a bit high, at least for IR scholars. The mean is closer to 0.76. A tenure case with an h-index of 6 six years out from their PhD would be in the 75th percentile of this group. This information is the kind of thing that should be conveyed to university-wide promotion and tenure committees, as notions of what constitutes productivity vary widely across fields. The 500th ranked IR scholar has 71 GS citations and an h-index of 5; the 500th ranked physicist has a few more than that.
  • Co-authoring is pretty common. For 59% of scholars in the sample, their most highly cited article/book was solo-authored; for the remaining 41%, their most highly cited article/book was co-authored. Interestingly, it breaks down that way even if we just look at people who got their PhD since 2000. Co-authoring, at least of IR scholars’ most influential works, does not appear to be such a recent fad.
  • Seriously? Nearly 30% of IR scholars don’t have a readily available CV that indicates the year they received their PhD? I feel no further comment is necessary.

Diving a little deeper, I used locally weighted scatterplot smoothing to estimate predicted GS citations, h-index and i10 scores as a function of PhD year. The results are as follows, and can be interpreted as the predicted mean score for each for a given PhD year; I only go back to 1990, as data are rather sparse before then:

PhD Year Predicted GS Cites Predicted h-index Predicted i10
1990 2685.4 17.2 26.9
1991 2510.9 16.6 25.5
1992 2339.8 15.9 24.2
1993 2174.3 15.3 22.9
1994 2012.4 14.7 21.5
1995 1852.2 14.0 20.2
1996 1698.1 13.3 18.9
1997 1549.4 12.7 17.6
1998 1399.8 12.0 16.3
1999 1260.7 11.3 15.0
2000 1132.8 10.7 13.8
2001 1006.5 10.1 12.7
2002 880.4 9.4 11.4
2003 765.2 8.7 10.2
2004 640.6 8.1 9.0
2005 506.3 7.4 7.8
2006 393.7 6.7 6.5
2007 305.5 6.0 5.5
2008 223.3 5.3 4.5
2009 170.8 4.8 3.6
2010 135.4 4.3 3.0
2011 108.9 3.8 2.4
2012 87.0 3.3 1.9
2013 64.9 2.9 1.4
2014 47.2 2.6 1.1
2015 46.2 2.5 1.0

These data are pretty clearly biased upwards by the presence of publishing all-stars (the aforementioned 10%, plus very highly cited junior and mid-career people) with citation counts that skew the distribution. Here’s the same table, but substituting the observed median values by PhD year:

PhD Year Median GS Cites Median h-index Median i10 N
1990 1786.0 18.0 24.0 9
1991 2160.0 19.0 27.0 9
1992 1491.5 19.0 29.5 10
1993 1654.0 18.0 22.5 16
1994 1643.0 15.0 19.0 9
1995 1983.0 16.0 17.0 7
1996 583.5 9.5 9.5 20
1997 396.0 10.0 10.0 15
1998 376.0 10.0 11.0 11
1999 755.0 12.5 14.5 24
2000 701.0 11.0 12.0 19
2001 301.0 9.5 9.5 18
2002 153.5 6.0 4.0 28
2003 220.0 8.0 7.0 28
2004 213.0 7.0 5.0 25
2005 144.0 6.0 4.0 15
2006 105.0 5.0 3.5 38
2007 98.0 5.0 4.0 46
2008 78.0 5.0 2.0 54
2009 76.0 5.0 3.0 29
2010 34.5 3.0 1.0 42
2011 22.0 3.0 1.0 46
2012 21.0 2.0 0.0 41
2013 19.0 2.0 1.0 42
2014 17.0 2.0 0.0 33
2015 8.0 1.0 0.0 17

So if you’re a couple years out and your citation count is barely cracking double digits, don’t worry: you’re in pretty good company.

Some caveats are in order. First, everyone in the sample both has a Google Scholar profile and self-identifies as an IR scholar; there is self-selection bias all over these results. The nature of the bias is probably upward, inflating the sample citation metrics relative to those of the population of interest. This rests on the assumption that people who believe they are doing well, by these metrics, will be more likely to make this information public. I believe this issue is particularly acute for the most recent PhDs in the sample. Second, there is really no way of speculating about the bias stemming from excluding those who are IR scholars but who do not self-identify as IR scholars. Third, I believe metrics should be a compliment to, not a substitute for, our subjective evaluations of the work. They’re another useful piece of information in forming the assessments of bodies of scholarly work that make or break tenure, promotion, and hiring processes.

Metrics will never fully supplant subjective evaluations of theoretical, empirical and normative merit. But they provide a necessary complement to them. So, what did I miss? And what would you like to see done with these data?

[1] As was thought at the time, anyway.

[2] Coding took place between 7/16/15 and 8/3/15.

[3] Both my PhD mentors, Steph Haggard and Kristian Gleditsch, were left off the list. You’re killing me!