The following is a guest post by Cullen Hendrix of the University of Denver.
If you’ve read or seen Moneyball, the following anecdote will be familiar to you: Baseball is a complex sport requiring a diverse, often hard-to-quantify[1] skillset. Before the 2000s, baseball talent scouts relied heavily on a variety of heuristics marked by varying degrees of sanity: whether the player had a toned physique, whether the player had an attractive girlfriend, and whether or not the player seemed arrogant (this was seen as a good thing). Billy Beane and the Oakland Athletics changed things with a radical concept: instead of relying completely on hoary seers and their tea-leaf reading, you might look at the data on their actual productivity and form assessments that way. This thinking was revolutionary little more than a decade ago; now it’s the way every baseball team does business.
Roughly around the same time, physicist Jorge Hirsch was starting a revolution of his own. Hirsch was ruminating on a simple question: what constitutes a productive scholar? Since the implicit answer to this question informs all our hiring, promotion and firing decisions, the answer is pretty important. In 2005, Hirsch published “An Index to Quantify an Individual’s Scientific Research Output”, which introduced the world to the h-index. Like most revolutionary ideas, its brilliance lay in its simplicity. Here’s the abstract:
I propose the index h, defined as the number of papers with citation number ≥h, as a useful index to characterize the scientific output of a researcher.
Thus, a metrics revolution was born. Hirsch had distilled information on citations and numbers of papers published into a simple metric that could be used to compare researchers and forecast their scholarly trajectory. That metric is at the heart of Google Scholar’s attempts to rank journals and forms the core of its scholar profiles. With Google’s constant indexing and unrivaled breadth, it is fast become the industry standard for citation metrics. Google’s scholar profiles track three basic statistics: total citations, the h-index, and the i10 index, which is simply the number of articles/books/etc. a researcher has published that have at least 10 citations.
So, what do these metrics say about productivity in the field of international relations, and why should we care?
Citation metrics are worth investigating for at least two reasons. First, metrics tell us something about how the work is being used. Stellar book reviews, gushing tenure letters, and article awards may tell us how the work is perceived, but those perceptions can be highly idiosyncratic. And, if we’re being honest, they can be driven by a host of factors (how well you are liked personally, the kind of shadow your advisor casts, whether the letter writer had just survived a shark attack, membership in the Skull and Bones Society, etc.) that have little if anything to do with the quality and/or the usefulness of the work in question. Yes, metrics do not completely get around these biases – see Maliniak et al. on the gender citation gap in IR – but that bias is much easier to account for than the former. Show me a book that was reviewed harshly but eventually cited a 1,000 times and I’ll show you a game changer. Show me a book that was reviewed glowingly and that has had virtually no quantifiable impact and I’ll show you a dud.
Second, this information may be useful to people submitting files for tenure and promotion to full. Before I started putting together my file, I realized I was completely unaware of what constituted a good citation record for someone who had been out for seven years. I’d heard various places that an h-index equal to or greater than years since PhD was a good rule of thumb, but that standard seems to have been designed with physicists in mind, and physicists publish faster than most people type. If you hear grumblings that you should have an h-index of such-and-such come tenure time, it would be good to know whether that bar is low or high, given prevailing citation patterns in the discipline.
With the help of RAs, I compiled data on the 1,000 most highly cited IR scholars according to Google Scholar.[2] Then, the RAs collected supplemental information on the year during which their PhD had been granted (PhD Year). The sample is one of convenience, based on those individuals who listed “International Relations” as one of the tags in their profile and for which the year their PhD was granted could be ascertained. For this reason, many highly (and lowly) cited individuals did not appear on the list.[3] However, the list includes all sorts: realists, liberals, constructivists, feminists, formal theorists, etc., and at all manner of institutions, though the bias is toward research universities. The list appears to be dominated by people at universities in the USA, UK, Canada and Australia.
Descriptive statistics for the group are as follows:
Variable | Obs | Mean | Std. Dev. | Min | Max |
GS citations | 713 | 915.2 | 2804.9 | 0 | 40978 |
ln GS citations | 713 | 4.8 | 2.2 | 0 | 10.6 |
h-index | 713 | 8.5 | 8.9 | 0 | 73 |
i10 index | 713 | 10.6 | 18.3 | 0 | 188 |
ln i10 index | 713 | 1.6 | 1.3 | 0 | 5.2 |
Most Cited | 713 | 184.9 | 567.7 | 0 | 9429 |
Ln Most Cited | 713 | 3.4 | 2.1 | 0 | 9.2 |
Most Cited Solo | 713 | 121.3 | 361.9 | 0 | 4620 |
Ln Most Cited Solo | 713 | 3.2 | 1.9 | 0 | 8.4 |
PhD Year | 713 | 2003.5 | 9.4 | 1961 | 2015 |
I plan to crunch the numbers a variety of different ways. For the moment, a cursory look at the data yields some potentially interesting insights:
- Most scholars are not cited all that frequently. It’s time to take a deep breath when worrying about your citation count. Yes, the Joe Nyes and Kathryn Sikkinks of the world can give us all a little count envy, but the median total citation count for all 713 scholars in the sample was 119. That includes at least one person who got their PhD while John F. Kennedy was still president. If we just look at people who got their PhD since 2000, the median is 57. That the mean is so much higher than the median tells us what many of us suspect is true: it’s a pretty unequal world. The top 10% of cite-getters in the sample account for ~75% of all the citations.
- The “h-index ≥ years since PhD” rule of thumb for scholarly productivity is probably a bit high, at least for IR scholars. The mean is closer to 0.76. A tenure case with an h-index of 6 six years out from their PhD would be in the 75th percentile of this group. This information is the kind of thing that should be conveyed to university-wide promotion and tenure committees, as notions of what constitutes productivity vary widely across fields. The 500th ranked IR scholar has 71 GS citations and an h-index of 5; the 500th ranked physicist has a few more than that.
- Co-authoring is pretty common. For 59% of scholars in the sample, their most highly cited article/book was solo-authored; for the remaining 41%, their most highly cited article/book was co-authored. Interestingly, it breaks down that way even if we just look at people who got their PhD since 2000. Co-authoring, at least of IR scholars’ most influential works, does not appear to be such a recent fad.
- Seriously? Nearly 30% of IR scholars don’t have a readily available CV that indicates the year they received their PhD? I feel no further comment is necessary.
Diving a little deeper, I used locally weighted scatterplot smoothing to estimate predicted GS citations, h-index and i10 scores as a function of PhD year. The results are as follows, and can be interpreted as the predicted mean score for each for a given PhD year; I only go back to 1990, as data are rather sparse before then:
PhD Year | Predicted GS Cites | Predicted h-index | Predicted i10 |
1990 | 2685.4 | 17.2 | 26.9 |
1991 | 2510.9 | 16.6 | 25.5 |
1992 | 2339.8 | 15.9 | 24.2 |
1993 | 2174.3 | 15.3 | 22.9 |
1994 | 2012.4 | 14.7 | 21.5 |
1995 | 1852.2 | 14.0 | 20.2 |
1996 | 1698.1 | 13.3 | 18.9 |
1997 | 1549.4 | 12.7 | 17.6 |
1998 | 1399.8 | 12.0 | 16.3 |
1999 | 1260.7 | 11.3 | 15.0 |
2000 | 1132.8 | 10.7 | 13.8 |
2001 | 1006.5 | 10.1 | 12.7 |
2002 | 880.4 | 9.4 | 11.4 |
2003 | 765.2 | 8.7 | 10.2 |
2004 | 640.6 | 8.1 | 9.0 |
2005 | 506.3 | 7.4 | 7.8 |
2006 | 393.7 | 6.7 | 6.5 |
2007 | 305.5 | 6.0 | 5.5 |
2008 | 223.3 | 5.3 | 4.5 |
2009 | 170.8 | 4.8 | 3.6 |
2010 | 135.4 | 4.3 | 3.0 |
2011 | 108.9 | 3.8 | 2.4 |
2012 | 87.0 | 3.3 | 1.9 |
2013 | 64.9 | 2.9 | 1.4 |
2014 | 47.2 | 2.6 | 1.1 |
2015 | 46.2 | 2.5 | 1.0 |
These data are pretty clearly biased upwards by the presence of publishing all-stars (the aforementioned 10%, plus very highly cited junior and mid-career people) with citation counts that skew the distribution. Here’s the same table, but substituting the observed median values by PhD year:
PhD Year | Median GS Cites | Median h-index | Median i10 | N |
1990 | 1786.0 | 18.0 | 24.0 | 9 |
1991 | 2160.0 | 19.0 | 27.0 | 9 |
1992 | 1491.5 | 19.0 | 29.5 | 10 |
1993 | 1654.0 | 18.0 | 22.5 | 16 |
1994 | 1643.0 | 15.0 | 19.0 | 9 |
1995 | 1983.0 | 16.0 | 17.0 | 7 |
1996 | 583.5 | 9.5 | 9.5 | 20 |
1997 | 396.0 | 10.0 | 10.0 | 15 |
1998 | 376.0 | 10.0 | 11.0 | 11 |
1999 | 755.0 | 12.5 | 14.5 | 24 |
2000 | 701.0 | 11.0 | 12.0 | 19 |
2001 | 301.0 | 9.5 | 9.5 | 18 |
2002 | 153.5 | 6.0 | 4.0 | 28 |
2003 | 220.0 | 8.0 | 7.0 | 28 |
2004 | 213.0 | 7.0 | 5.0 | 25 |
2005 | 144.0 | 6.0 | 4.0 | 15 |
2006 | 105.0 | 5.0 | 3.5 | 38 |
2007 | 98.0 | 5.0 | 4.0 | 46 |
2008 | 78.0 | 5.0 | 2.0 | 54 |
2009 | 76.0 | 5.0 | 3.0 | 29 |
2010 | 34.5 | 3.0 | 1.0 | 42 |
2011 | 22.0 | 3.0 | 1.0 | 46 |
2012 | 21.0 | 2.0 | 0.0 | 41 |
2013 | 19.0 | 2.0 | 1.0 | 42 |
2014 | 17.0 | 2.0 | 0.0 | 33 |
2015 | 8.0 | 1.0 | 0.0 | 17 |
So if you’re a couple years out and your citation count is barely cracking double digits, don’t worry: you’re in pretty good company.
Some caveats are in order. First, everyone in the sample both has a Google Scholar profile and self-identifies as an IR scholar; there is self-selection bias all over these results. The nature of the bias is probably upward, inflating the sample citation metrics relative to those of the population of interest. This rests on the assumption that people who believe they are doing well, by these metrics, will be more likely to make this information public. I believe this issue is particularly acute for the most recent PhDs in the sample. Second, there is really no way of speculating about the bias stemming from excluding those who are IR scholars but who do not self-identify as IR scholars. Third, I believe metrics should be a compliment to, not a substitute for, our subjective evaluations of the work. They’re another useful piece of information in forming the assessments of bodies of scholarly work that make or break tenure, promotion, and hiring processes.
Metrics will never fully supplant subjective evaluations of theoretical, empirical and normative merit. But they provide a necessary complement to them. So, what did I miss? And what would you like to see done with these data?
[1] As was thought at the time, anyway.
[2] Coding took place between 7/16/15 and 8/3/15.
[3] Both my PhD mentors, Steph Haggard and Kristian Gleditsch, were left off the list. You’re killing me!
Cullen, thanks for offering a fact base for us to think about! Interesting stuff here. I would only caution you to take more seriously the caveat you add at the end about treating GS citations as only one tool among many. You undermine it earlier in your post when you imply that published reviews give us little insight on quality, whereas GS citations do. I have a working paper that highlights some of the key limitations to GS citations, especially in regard to how we think about scholarly influence. I find that many IR pieces with lots of GS citations are rarely taught (at least in the US), and conversely some of the “canonical”, widely-taught articles have relatively few GS citations. I’m reluctant to say more because the working paper is *almost* through the review process, but not yet.
Jeff, definitely looking forward to reading it when it eventually comes out. I’d only point out the following: whether something shows up on graduate and undergraduate syllabi is another imperfect measure of impact, based on specific assumptions about what matters to the discipline. GS-based metrics are no different, but I’d argue they make weaker assumptions about what matters.
Awesome article. Really, really useful and interesting. Just to push on one thing for further discussion – I can think of 2 places where someone might have a pretty low citation
count, but I’m not sure it would be fair to call their work a dud: 1) If you wrote *the* book about something, and it’s a great book, but most scholars are chasing a kickball elsewhere on the field. That is to say, if the book is actually really good, but everyone is obsessed with whatever the thing is of the moment, I’m not sure it means your work is a dud. It might mean it’s not at the core of where the field is at a given point (and over an entire career, perhaps that evens out) 2) If you wrote *the* book or article that basically ends a research agenda or debate on a topic by resolving things to the general satisfaction of the field, and then people
move on to something else.
Great piece Cullen, extremely interesting! A quick point about why we may cite certain work. Many see citations in terms of a network approach (there’s actually a large academic literature on this in several disciplines). Empirically, citation data seem to generate a “scale-free network” with a power-law distribution — i.e. preferential attachment in which newcomers form ties with existing nodes with a probability proportional to the degree count of those existing nodes. Network “hubs” that are already heavily cited continue to get cited because they are often first on the scene in an emerging research agenda, though sometimes not the best work. I’m not sure the resulting citation data metrics actually capture scholarly “productivity” (which you recognize in the piece). If the goal is objective, data-driven evaluation, these network dynamics seem to suggest an issue with measurement validity. Citation metrics might indicate “innovation,” “agenda-setting,” “academic creativity,” “fad-capture,” or many other concepts in addition to or aside from “productivity.” Just a few things to think about… Better metrics certainly could compliment subjective evaluations and I look forward to seeing how this develops!
Steve, this is really thoughtful stuff. I am in full agreement that reputation and human networks go a long way in explaining the scholarly networks that arise. My quick-and-dirty response is simply that the same issues bedevil non-metrics based evaluations, including what you mean when you say “the best work” (hence the part about being well liked, of having an influential advisor, etc.). There is no equivalent of AP exam grading (anonymous graders grading anonymous writers with no specific stake in the outcome) for academic work. It exists in a social space. I’m just more worried about the small-sample properties of that social space (i.e., recommendation/tenure letter writers and reviewers) than those that might arise from analyzing citations.
I think you are definitely on to something in that citations are probably capturing a lot of other stuff, although I’d like to think innovation, agenda-setting, creativity, and even fad capture are net positives in terms of evaluating scholarly profiles.
Could you provide a list of the 1,000 most cited IR scholars?
The Moneyball reference is interesting. Sabremetrics and quantitative reasoning has actually been downplayed in recent years. It was something of a fad. The qualitative scouts are back in demand, actually.
Nowadays, virtually all teams use both.
Cullen – excellent post, really enjoyed reading it. A couple of things to add, basd on my 2 articles on citation patterns in PS. You wonder about selection bias, which implies the counts may be higher than IRL. True, but GS also under-counts scholarly impact because it does not count citations articles receive in books – and comparativists and IR scholars publish lots of books. (The only way to find cites of an article in a book is to enter the article title as a quote in Google Books. At least for the books Google has scanned/digitized, this will give you an iudea of how many times your article has been cited in a book.) Second, GS doesn’t accurately count citations OF books, again because it can’t pick up the citations that a book receives in other books. So GS is really unfriendly to scholars whose “big hits” are books, and also under-counts scholarship that has a big impact among other scholars who publish books.
David,
Thanks for your response. Do we have any idea how large the undercounting effect is? I searched Activists Beyond Borders and it certainly seems highly cited: https://scholar.google.com/scholar?hl=en&q=activists+beyond+borders&btnG=&as_sdt=1%2C6&as_sdtp=
Cullen – Let me reiterate that there are two issues at stake. The first is that GS undercounts citations that ARTICLES receive, because it undercounts citations that articles receive IN BOOKS. Counting cites articles receive in books reveals GS undercounts cites by about 50%, depending on the subfield (undercounting is worse in IR and CP). The second issue is that GS doesn’t count citations that BOOKS receive very well, either in articles or in other books. Activists Beyond Borders is probably not an optimal example, as it is in the top .000001% of whatever distribution you choose to use. Somewhat solipsistically in the 2013 PS article I used my own 2003 book as an example. GS only returned 6 citations IN BOOKS in the 1st five years, but GB showed many more. It’s not quite ‘fair’ to compare books against articles (in my dept we resolve the apples/oranges argument by counting a Univ press book as equal to 3 top-tier articles, FWIW) but after a slog of counting by hand (web scraping wouldn’t work b/c of lots of garbage/repeats in GB) I could confirm that that approach actually makes some sense b/c the average university-press book receives about 3X the citations that an SSCI-indexed article does. About 1/3 of those cites come in articles, 2/3 in other books.
David – thanks. I hadn’t seen your PS piece, so this is helpful. Knowing something about undercounting would presumably allow for adjustment.
The average uni-press book receives 3X the citations as an SSCI-indexed article? What’s three times zero? ;-)
Haha, but Gary King’s famous line (which I think he cribbed from someone else) vastly overstates the incidence of zero-cited work, especially (this was the point of the article, after all) when you count citations that articles receive in books.
Really interesting discussion all around, thanks Cullen et al. Just a quick question from David’s point here, I just checked my own GS profile and 5 of the few cites an article of mine has received are from books–some edited, some single-authored. Is it possible that GS has updated this very recently, or (more likely) that I’m a Luddite who can’t parse the citation count?
GS has gotten better but it still undercounts books, for reasons beyond me. Type the full title of your book into GB, you’ll get some similar and some different results…
Did your intrepid RAs capture anything else other than year of PhD? There are probably a lot of other factors that contribute to cite counts (and h-indexes), including institutional incentives for publishing. I wonder if you also could hook this into TRIP’s article citation datasets in some useful way; they would also provide a few useful cross-checks.
Alex, I think a lot of the institutional incentives stuff is rolled up into the self-selection process; that’s not as useful as being able to directly model it, but I do think it’s somewhat accounted for. I do have current institution and PhD-granting institution, so there is some stuff that could be done with that. As you might imagine, number of current institutions > number of PhD-granting institutions.
Thanks for this great post- I’ve been wanting to read more about the h-index and some of the pros and cons for the move in social sciences to readily refer to it. One thing I’ve always been unclear on: how does self citation impact one’s h-index? As in, if a scholar cites themselves as much as possible in each of their publications do they skew their h-index? Is this part of the Moneyball game? Of course it is perfectly acceptable to cite oneself on occasion, but I’ve noticed some scholars that seem to manage to include every publication in each subsequent article. Just curious.
GS doesn’t exclude self-citations.
One way to know if this is a problem: I could pull WoS citation metrics for a subset of these scholars and see how highly correlated their scores are with and without self-citations. My prior is that the self-citation bias is real but declining in total citations.
My impression from 1) staring at these data so long and 2) editing CPS is that self-citing is overblown as a concern, at the end of the scale that matters, ie the top 10-15% (or less). It’s just not possible to cite your own article dozens or hundreds of times. At CPS we’ve implemented a “no more than 5” self-cites in any initial submission, partly to preserve anonymity as much as possible and partly because, well, that’s plenty. The issue here isn’t really whether doing so does or does not preserve anonymity, but that we rarely have to return a ms for resubmission because of “over-self-citing” – it happens, but <5%, probably more like 2% of the time.
Good to know David- I like the ‘no more than 5 self cites’ rule! If only it were standard. :)
One other issue is the some scholars’ most cited pieces may not be peer-reviewed. If you look at my GS page, the second highest item is a 2007 report I wrote for the Council on Foreign Relations. Is that appropriate? I don’t know. I teach in an inter-disciplinary policy school. Given the other problems David Samuels flagged about under-counting of citations in books, the lessons I draw is that the metric may be problematic if we adopt as a signal for tenure, promotion, raises, etc. Ideally, we would have something more appropriate that reflects idiosyncrasies in our field.
I think it could be useful as a part of assessment, but not the whole enchilada. What we have now strikes me as unacceptably idiosyncratic. I’m glad people are offering useful potential additions/correctives to the basic framework.
In re. the CFR piece: I suppose it depends on what kind of place you work. That these citations (and this kind of work) would be heavily/completely discounted strikes me as absolutely maddening. Peer-reviewed vs. non-peer-reviewed would seem to be a bright line, but in my experience I’ve had easier times getting PR stuff accepted than some policy reports!
From my research I think your case is fairly rare, may have a lot to do with your work and audience than most political scientist. It’s far more common among economists, where SSRN and esp NBER papers have made journal publishing almost an afterthought. The reason I wrote those 2 pieces was b/c I was tired of sitting in faculty mtngs where Americanists went on and on about “APSR, JOP, AJPS yada yada” without any clue about publishing OR citing patterns in the other subfields. But after all that grubbing around in the data, what did I find? Highly-cited articles are highly-cited no matter what index you use, and the correlation b/w citations an article receives in articles and what it receives in books is, not surprisingly, high. Bottom line: GS does give us a useful proxy for overall impact. It dramatically undercounts. But an article with 3 cites after 10 years on GS is not a high-impact article, and one with 300 is. End of discussion. And although we do spend a lot of time arguing about these things, I don’t think we’ll ever have an unproblematic metric. I like that GS gives total counts, the H-index, and the I10-index – we should have multiple ways of looking at impact, not just one.
Utterly, utterly fascinating – thank you for running the numbers on this, and for doing it in such a thoughtful way. I think an added parameter that would be interesting would be seeing a breakdown by sex of the people who made the list of most cited/most influential. Institutional affiliation would also provide an additional parameter of interest. Thank you again!