Academia isn’t Baseball

43 Comments

Eric Van Rythoven on 10 August 2015 at 18.49 EDT

Dan, Patrick, this is a great post. Jarrod and I have something coming that deals with this issue. It should be ready for DoM by late August. One thing that I would add to your critique is that these metrics are often viewed through the lens of regional concerns. A highly cited article in the US may have little purchase in Europe, and vice versa. This regionalism can severely complicate ‘objective metrics’ of an article’s value.

In other words, is there a difference in the value of an article that has 200 hits on Google Scholar but all largely confined to the United States, versus an article with 100 hits distributed across a dozen different countries? How do citation metrics capture the importance of an article appealing beyond borders?
- Dan Nexon on 10 August 2015 at 21.17 EDT
  
  I think this is making my head explode… It seems vaguely related to my desire for the creation of a journal “value added” measure that would provide estimations for how much of a ‘boost’ to Nc an article gets for appearing in a particular journal, but sliced-and-diced by geography.
  - Eric Van Rythoven on 11 August 2015 at 10.30 EDT
    
    One more thought about this in terms of the recent DP debate in ISQ. Assume Poznansky’s and Barkawi’s pieces both receive a high number of citations (as I suspect they will, they look good). What if the former is cited predominantly in the US while the latter is cited more broadly? Would that support Barkawi’s view of DP theory as a distinctly American project? What if the converse is true, does that mean DP theory is actually an objective social-scientific fact?
    
    It’d be interesting to know how this shakes out five years from now.
Alex Montgomery on 10 August 2015 at 19.33 EDT

In order to come up with a metric (objective or not), we’d have to first agree on the purpose of academia – or at least IR academia. Good luck with coming up with consensus on that. And maybe it’s good that we don’t have consensus.

I would love to see a metric along the lines of “genuinely new ideas” – and keeping in mind the Waltz and Huntington, we’d probably need to be more specific, e.g., “genuinely new and GOOD ideas,” seeing as Huntington was clearly visited by the Bad Idea Fairy multiple times later in his career. I once asked Richard Ashley why he didn’t publish more; he replied that he liked to wait until he had a new idea. Such a metric would encourage more thinking and less incontinence of the keyboard.

Also, where can I get a card like Patrick’s? [And how do you calculate your institution’s W-L record, anyway?]
- Dan Nexon on 10 August 2015 at 21.12 EDT
  
  If we could calculate the W-L, PTJ could have a real card ;-).
  
  I used preview to amateurishly and laboriously “photoshop” something I found on the web.
- Amanda Murdie on 11 August 2015 at 18.30 EDT
  
  Wouldn’t Ashley do pretty well in a Moneyball system, though? :-)
- LauraSjoberg on 16 August 2015 at 22.53 EDT
  
  Alex, I didn’t get a chance to read this before I wrote my post – but I think the question of purpose is exactly my struggle with this debate … https://relationsinternational.com/why-i-dont-give-a-shit-about-my-h-index/
W. K. Winecoff on 11 August 2015 at 05.50 EDT

I agree that it’s not a very good analogy: a big part of using metrics in baseball was to find *undervalued* qualities that indicate strong performance, and then exploit those market inefficiencies to out-perform richer peers. (I think hiring departments could employ this strategy if they wanted to do so, but very few do.) I don’t see citation counts (RBIs?) as particularly undervalued, and something like an h-index is very similar to paleo baseball stats like batting average.

Academia might not be baseball but it certainly resembles a game in some key ways: strategic behavior is certainly rewarded.

That said, I think it definitely is worth looking for interesting patterns in these and other data. In one important sense I think the analogue to Moneyball is fine: in Moneyball, senior scouts were missing important characteristics of players by relying only on subjective criteria; it seems entirely possible — almost certain in at least some cases — that scholars writing tenure letters or whatever might do the same thing.

If developed and deployed correctly citation-based metrics could help reduce some of that subjectivity, much of which is as biased as citation-based metrics themselves. Does that mean they can, or should, entirely replace subjective assessments? Of course not. And the argument that the “Moneyball” revolution did so in baseball is completely overblown. All successful teams — indeed, all teams full stop — use both.

I read Hendrix’s post as simply saying that since these data exist we should try to make some good use out of them. Which strikes me as unobjectionable. Why would we want to ignore information that is readily at our disposal? Yet, that is the only actionable conclusion that I can discern from this post.

So I don’t understand the objection from a “what’s the alternative and is it any better” point of view. And I don’t understand it coming as it does from two scholars who are some of the most dedicated in our field to public outreach, fostering scholarly debate, and increasing public access to scholarly works. If successful, these actions are likely to… boost citation counts (among other, less measurable, things)!
- Cullen Hendrix on 11 August 2015 at 16.25 EDT
  
  Maybe the better analogy would have been to the earliest days of baseball, when “batting average” first replaced “old Lefty McGillicutty is the finest batsmen to every grace the friendly confines of Shibe Park, I tell ye…”
  - Cullen Hendrix on 11 August 2015 at 16.36 EDT
    
    And yeah, baseball had been cruising along for quite some time before Shibe Park was built…I was going for a Monty Burns reference.
Joseph O'Mahoney on 11 August 2015 at 09.15 EDT

I also thought this was a fantastic post and wanted to offer a response to W. K. Winecoff’s last question. As I understand it, you are saying that there are two alternatives, use the data or ignore it, and ignoring it is just silly because more information is always better. I’d like to put the case for ignoring it, mainly using the points made in the original post. I’ll leave aside measurement issues, like what counts as a citation (non-peer-reviewed outlet? blogpost? syllabus?). My main point is that the mere existence of the measure will have effects and that qualifiers like “If developed and deployed correctly” are reminiscent of debates over communism in theory vs communism in practice. (I’d also mention that a concern with “productivity” rather than “impact” reflects a perhaps peculiarly American moralistic obsession with working all the time.)

One argument in favor of ignoring the data is that an easily accessible number will lead to the crowding out of other forms of evaluation, leading to people not doing things that won’t contribute to increasing that number. This is part of the performativity problem. So if there is anything else that we think scholars should be doing that is not captured in the measure, having the measure at all means less of that thing. Various teaching and outreach activities are the obvious candidates. Another argument is that there will be a way that the number is calculated and that way will not perfectly incentivize the kind of behavior that we want to incentivize and so promote “gaming” of the metric. One example might be spreading work out to the minimum publishable unit. Also, because it is seen as objective (as you say) and so creates the illusion that it is better than some other type of judgment when allocating resources. As Nexon/PTJ point out, we have no good, clearly articulate, reason to think that citation counts are “better” than other types of judgments.

A more fundamental problem is that numbers like the h-index or citation counts are interval-level variables. This means, inter alia, that they are ranked, creating the illusion of a hierarchy where there may be none. Imagine someone working on the behavior of rebel groups in civil war who is working on a topic with lots of other people involved who cite each other a lot but who are all making relatively incremental contributions on each others’ work. Then contrast that with someone studying political coalitions in Botswana who works out exactly the dynamics underlying the last election there. What does it even mean to say that one is more or better than the other? The utility of a metric is that it enables you to say, e.g., Botswana scholar A has more impact than Botswana scholar B. But the mere existence of the measure means that people will use it to create ordinality where there isn’t any, or shouldn’t be any. Some comparisons, perhaps most comparisons between scholarly output are differences of kind, not of number. Again, the mere existence of the measure means that abuse will happen. And so, ignoring it or prohibiting its use, or even getting rid of it, is the preferred option.
- W. K. Winecoff on 11 August 2015 at 16.42 EDT
  
  “”If developed and deployed correctly” are reminiscent of debates over communism in theory vs communism in practice.”
  
  Are you %^&*ing kidding me? In what way are the two reminiscent of each other? Google Scholar may miss a hit or several here or there, but it’s not going to miss 500 citations nor wrongly add 500. But even if it did, we’d want to know why that was happening so we could contextualize the numbers. Right? Ignoring them adds no value.
  
  Every other point you made can just be made in reverse: relying solely on the impression of Scholar X is no more pure, but judgments over quality and impact are — and, sorry to say, must be — made nevertheless. People are already incentivized to produce work that generates “impact”, which is currently defined in many many different ways. So I’ll stick with my conclusion… why not both? I really don’t see the downside here unless you are just opposed to counting things up as a universal rule.
  - Joseph OMahoney on 12 August 2015 at 09.26 EDT
    
    My intention with the communism debate analogy was to suggest that you could exclude any negative effects of the citation index by saying “well, it is not being deployed correctly”, in the same way that you could say explain failures of really existing socialism by saying that it wasn’t true socialism.
    
    Say a department institutes a rule that you must have an h-index of X or higher in order to get tenure, or get hired in the first place. But because books are undercounted, say, people doing book-type work get fired or don’t get jobs. Well, one might say, that is not a problem of the information, that is a technical issue that if only we tweaked the index properly would not happen. And besides, having a rule like that is not a correct deployment of the information. And yet, regardless, there are practical consequences to the existence of the index given the frailties of human decisionmaking.
    
    But you are missing the point when you refer to Google Scholar miscounting citations. The issue is that citations (even if you could take into account the “quality” of citations) are not the only measure of quality. But if there is an easily accessible number that purports to measure “scholarly goodness”, people and organizations will move towards using that number to the exclusion of everything else, and for purposes that make no sense.
    And it is not the case that the only alternative is to do the same thing with “impressions by Scholar X”. Sometimes we should do nothing at all. That is, sometimes there should be no judgment over quality and impact-as-citability. For example, resources could be allocated at random or according to some sort of idea of fairness, like who needs it more.
    - W. K. Winecoff on 12 August 2015 at 09.43 EDT
      
      “Say a department institutes a rule that you must have an h-index of X or higher in order to get tenure, or get hired in the first place.”
      
      No, let’s not say that. It’s never happened, no one is proposing such a rule, and if any college or university ever tried to impose one it would probably violate employment contracts and/or otherwise raise the ire of a whole host of groups, including faculty senates and the AAUP (in the US case). At best this is a slippery slope argument.
      
      As it stands, I think there are many more places where having a book is a prerequisite for tenure than places where it would hurt you via under-counting. And if books truly are undercounted — I’m not sure about that, truthfully; the claim I’ve seen is that things cited *in* books are undercounted, since Google hasn’t scanned every book in the world — they are still almost certainly much more highly-cited than any randomly selected article.
      
      I’ve never said citations measured quality, and neither has anyone else, so no: I’ve not missed the point. As the OP says, sometimes things are cited a lot because they are bad and they provide a useful foil. But since not one single person has yet suggested that we ditch subjective assessments entirely in favor of context-free statistics I don’t really think that’s a problem. If my main piece of research is only being cited so that people can laugh at it that will probably turn up in my external letters.
      - ProfPTJ on 12 August 2015 at 13.43 EDT
        
        There are already informal rules about how good an h or how much of an impact one needs to have to get tenure. Departments don’t need to *say* this in order for it to be the case. And it’s not just, or perhaps not even mainly, departments; it’s university-level committees, deans, provosts, and boards of trustees. Context gets lost as you move up the chain; only the number remains.
      - Andy Hom on 20 August 2015 at 18.59 EDT
        
        “Context gets lost as you move up the chain; only the number remains.” … and those employed primarily (solely?) to evaluate, plan, and boost the numbers (this is particularly the case in the UK REF climate). This debate clearly has legs, here’s hoping it has enough to walk as far up the chain as necessary to reach administrators and management professionals who really need to hear it.
Cullen Hendrix on 11 August 2015 at 16.23 EDT

I’m heartened that my original post has sparked this much discussion. While opinions will vary as to the utility of metrics-based assessment, I think it’s best to have these discussions in the open. The simple fact is that we make comparisons and implicit judgments of worth all the time. Every employed person commenting on this board beat out competition for their slot; every person publishing is implicitly competing in the marketplace of ideas for scarce space in journals and other outlets. This competition generally requires us to compare apples and oranges (or maybe Fuji apples and Granny Smiths). Metrics may form one basis for doing so. They need not be the only one. But whether or not they are put into play, make no mistake: these comparisons are being made all the time.

That the metrics are imperfect (paleo-metrics, as one person put it) is not to say they’re useless. One of the benefits of transparency in how they are collected and measured is that these flaws are fairly evident and presumably correctible – we know about the gender gap, and presumably there are other gaps. There is a target and it can be pinned down. Can you say the same about the process by which scholars arrive at their subjective assessments of what constitutes “good” or “impactful” work?
- Dan Nexon on 11 August 2015 at 16.37 EDT
  
  Yes. Of course we should use multiple indicators. And we desperately need more of what you, the TRIPS people, scholars using SNA tools on citations, and others are doing. These metrics are here to stay, and we must develop a better understanding of how they work and what they mean.
  
  This concerns me a bit, though: ‘Can you say the same about the process by which scholars arrive at their subjective assessments of what constitutes “good” or “impactful” work?’ I am totally ready to admit that such approaches introduce all sorts of problems—including subtle race, gender, and other biases. At the same time, the choice isn’t between “subjective” and “objective” or “transparent” and “non-transparent.” Such judgments become transparent by making reasons publicly available. And just because we can develop a quantifiable proxy doesn’t mean that we should conflate that proxy with the underlying ‘thing’ being measured, especially if that ‘thing’ *is* ultimately intersubjective rather than objective in nature.
  
  As I hope we made clear, the bigger concern here is with falling prey to the “we have measure, and that’s better than not having one, so let’s go with it.’ Doing that constitutes an active intervention into the practice of scholarship, with real-world impact on the behavior of scholars. Maybe the academic field we create is, indeed, better than the alternatives. But I see major warning signs when we look at places that have instituted Np and/or Nc as the primary criteria for rewarding scholars and departments.
  
  There are also other consequences. Np criteria put tremendous burden on journals and the peer-review process not only as locations where above-the-bar research gets made ‘public’ (albeit usually understood as ‘public’ to people with institutional access), but as the primary mechanism for allocating academic capital. I’m not sure that journals, and the decisions of a few anonymous referees, can realistically shoulder that burden.
  - Cullen Hendrix on 11 August 2015 at 16.42 EDT
    
    “And just because we can develop a quantifiable proxy doesn’t mean that we should conflate that proxy with the underlying ‘thing’ being measured, especially if that ‘thing’ *is* ultimately intersubjective rather than objective in nature.”
    
    I agree with this statement wholeheartedly: I’ve published on issues of construct validity and measurement in the past, so I take this very seriously. I think a potential next step (that would probably be disastrous for civil discourse on the subject) would be to identify those individuals at different times since PhD who the metrics suggest are very influential and see how that jibes with more subjective, survey-based indicators of influence (i.e., the TRIPS survey).
    - Dan Nexon on 11 August 2015 at 16.46 EDT
      
      That would be great. But if we are facing a growing performativity problem, then….
      
      (Oh, and you can’t be held responsible if the field can’t handle the truth….)
      - W. K. Winecoff on 11 August 2015 at 16.48 EDT
        
        The “performativity problem” hypothesis is one that needs to be established, not just asserted, before it is accepted fully. If it was so easy to rig citation counts then everyone would be doing it and it would all just cancel out.
      - Dan Nexon on 11 August 2015 at 16.56 EDT
        
        I wasn’t thinking about rigging. More along the lines of “Effective” or “Barnesian” performativity. But that’s just a matter for additional work, not a ‘don’t go there.’
        
        https://www.helsinki.fi/tint/maki/materials/PerformativityEPSAvolume012013.pdf
      - Cullen Hendrix on 11 August 2015 at 19.16 EDT
        
        To your point: I’ve done some analysis that tries to standardize performance relative to cohort (i.e., PhD year) locally smoothed means for the various measures (total cites, h-index and i10, but also highest cited solo work) and the results have considerable face validity. For people with PhDs from 1980 to 1990, the top of the list includes (in alphabetical, rather than rank order) Amitav Acharya, Daniele Archibugi, Andy Bennett, David Lake, Helen Milner, Kathryn Sikkink, Duncan Snidal, and Alec Stone Sweet.
- Mary Manjikian on 12 August 2015 at 08.10 EDT
  
  I’m thinking that one of the problems here is similar to the problems you encounter in lots of game theoretic modeling — the fact that you have to assume that there is a ‘default player’ or individual who looks basically the same no matter who is playing. Wondering if the curve looks the same for women — My sense is that a lot of women may be doing their best scholarship as empty nesters and may bloom a bit later as scholars. I’m always struck by how the conferences seem to hollow out as women scholar hit peak childbearing years — lots of women at conferences in their twenties, and then again in their fifties. Often it seems like there are fewer scholars there in their forties and late thirties.
Joseph K. Young on 11 August 2015 at 17.15 EDT

Thanks Dan and PTJ for a good read. I’ve never been a fan of theorizing by analogy, and I don’t think Cullen was attempting to do this in this piece. Moneyball was a nice hook, but probably shouldn’t lead to the conclusion that political science = baseball.

The issue at stake, as far as I understand it, is how to observe productivity. Do we need to? As a department chair/dean/hiring committee, the answer is a clear yes.

I think Cullen’s post, your response, some of the comments help us all think about how we might do so. Clearly an H Index can inflate Huntington and deflate scholars for a variety of reasons. But to suggest that we can not observe productivity and that metrics are useless is not what I (hope?) think you’re saying.

In most cases, I think these metrics provide a way to think about productivity that is open and worthy of debate and revision. Coupled with more subjective assessments, I think these can be truly powerful tools (starting to sound Bayesian). Relying on whether a scholar is productive based solely on opaque subjective assessments or because they belong to the right club or received their PhD from the right place, strikes me as (the worst?) a worse way of thinking about productivity.

In sum, the H index or any metric needs to be matched to a concept. Assuming the concept is productivity than the arguments should be over how best to do so and how best to combine with other ways to observe the phenomenon to generate a more complete picture.
- Dan Nexon on 11 August 2015 at 17.23 EDT
  
  Just to be clear, the target isn’t what Cullen’s doing. The analogy provides a springboard for related, but separable, issues.
- ProfPTJ on 12 August 2015 at 13.46 EDT
  
  Joe, why do we need to observe productivity? Can’t we observe other things, like “being intellectually interesting and engaged”? You know, stuff that is probably more closely related to our day jobs of being good teachers for the students paying massive tuition bills for the privilege of sitting in our classrooms?
  - Amanda Murdie on 12 August 2015 at 14.21 EDT
    
    So, you don’t think “being intellectually interesting and engaged” correlates with GS citations? I’m not exactly sure why I keep citing all these disengaged, uninteresting scholars, then! I’m not saying they are perfectly correlated but let’s not just assume the opposite.
Amanda Murdie on 11 August 2015 at 18.17 EDT

Hi PTJ and Dan, Interesting
discussion! A few points: (1) I don’t think anyone would argue that
“winning” higher education is based solely on GS citations. (2) However,
to answer your question “Does a department composed of
scholars with high h-index numbers ‘win’ higher education?”, probably
not. But, a department composed solely of
scholars with low h-index numbers definitely isn’t “winning” much in terms of
productivity and, I would venture, in terms of research. And (3), could you provide me with a few
examples of scholars that are “losing” the h-index/GS citation “battle” but are
still contributing to our IR knowledge? I am betting they are still “winning”
in other areas that could also be quantified Moneyball-style (like association-level
awards for books and articles, counter-offers, invited talks, grant monies etc). I’m not arguing that the Scholar-Only-Known-By-Non-Moneyball-Methods doesn’t exist; I just can’t think of any myself.
- Mary Manjikian on 12 August 2015 at 08.12 EDT
  
  I also wonder about advantages that might accrue over time, in an iterative fashion. I.E. Publish a couple of articles that have a lot of impact in top journals, get awarded a big prestigious fellowship which funds you to do more of the same sort of work, etc. I think the recent dustup about the fellow from California with the manufactured data suggests that often the people who are doing the hiring, etc. might not actually be reading all or analyzing all the scholarship as much as using it as a proxy for something else like reputation, impact, evidence of future productivity, then piling on if something “seems” solid or significant, even if it i isn’t.
  - Cullen Hendrix on 12 August 2015 at 11.21 EDT
    
    The Lacour fiasco is due to metrics-based analysis? His malfeasance was so deliberate and layered that his own CO-AUTHOR, one of the discipline’s leading scholars, didn’t catch it. This is a bridge too far for me. As long as there is any means by which we assign differential rewards (economic, social and professional) to work, these types of things will happen.
- Joseph OMahoney on 12 August 2015 at 09.37 EDT
  
  Perhaps their point was not that defining winning by GS citations was wrong but that seeing academia as something that you can win is wrong. Like, say, breakfast. A might eat more lucky charms than B but that doesn’t mean that A wins. Winning is just not something that happens to breakfast.
  - ProfPTJ on 12 August 2015 at 13.47 EDT
    
    I think winning breakfast is a lot easier than winning academia, personally ;-)
Mary Manjikian on 12 August 2015 at 08.03 EDT

I wonder what the authors would think about the ‘metrics’ that you get on academia.edu which tell you who has downloaded your article and even generates a map for you of all of the countries where your work is being read. It also tells you the query they used to find your work and the city where the query originated (which is kind of cool if you write about intell, since sometimes you can figure out it’s a city that has a military academy, etc.) I ask because there are ‘ads’ on the site where individuals note that they are including these maps and lists of queries in their tenure files as evidence that their scholarship is getting out there.

Personally, I’ve been wondering about the economics of the thing — My sense is that people search academia.edu when they can’t afford to subscribe to the journal you might have published some version of the piece in. I wonder if overall your work gets cited less or more depending on other factors like how expensive the database is that your work appeared in and how many universities in the US and abroad can afford to subscribe to it.
David Samuels on 12 August 2015 at 12.06 EDT

Comments on the the conceptual and practical issues. First: Yes, even “wrong” arguments like Clash of Civilizations are an important part of “producing knowledge.” After all, the vast majority of people on this planet are not constructivists. They believe deeply that Huntington’s argument is obvious. Constructivism asks a lot of people, particularly those who assume,
unknowingly, a primordialist understanding of identity. It’s hard to
teach. Most of us teach at places where our students share those primordialist assumptions, at least at the start of the semester. We need such readable examples to use in undergraduate classrooms – they’re useful foils. The fact that thousands have converged on it as a focal point for debate is instructive (can I learn to write like that please?), and more importantly it pushed scholarship on identity forward in useful ways. (And of course one can’t deny Huntington was a monstrously influential scholar, even if he had some misses as well as hits. Nobody’s perfect.)

On the practical issue: the real problem is that we still lack accurate measures of impact, for many of us. Many of the comments below, as well as the OP, wonder about book citations. Please see my two articles in PS: BOTH articles AND books are undercounted because Google Scholar does not accurately count citations in books. This does not mean that there’s no “truth” to GS: a highly-cited article in GS will remain highly cited once you account the # of cites it has obtained in books. And a poorly-cited article will remain poorly-cited after doing the same. So OK, GS is a decent proxy for “impact” (I do not say “knowledge production” because I don’t think that’s what we do) *for people who write articles*. But it is not a good proxy for impact for people who write books, especially for younger scholars, since their cite counts will be deflated when the clock expires, compared to someone who writes articles.

I know that political scientists tend to completely ignore PS, but once in a while there’s a semi-informative little article in there…on citation counts, at least…
- W. K. Winecoff on 12 August 2015 at 19.11 EDT
  
  “But it is not a good proxy for impact for people who write books, especially for younger scholars…”
  
  Can you elaborate on this please? Do you mean just that it takes a long to write books and then a long time for them to appear, therefore they will be cited less? Or do you mean that there is a counting bias against books in the metrics?
MAS on 14 August 2015 at 13.50 EDT

So I realize that I am late to the discussion but two points:

1) Win Shares? Really? I love Bill James and I have read that book from cover to cover and back again but that is so 2005. I would like to introduce Dan and Patrick to Wins Above Replacement (WAR). You even have your choice of two different yet legitimate models!

2) Should I be the guy who says that much of the debate below is the rationalist/constructivist, dualist/monist, (neo)positivist/post-positivist debate replayed in the realm of citation scores? Right down to the level where I keep feeling like the side that I disagree with is missing the point.
Idean Salehyan on 15 August 2015 at 16.34 EDT

Eh. I’m not convinced. Academic does have a “goal”. It is to produce knowledge that is both insightful and useful, and in that, we are all on the same team, working toward the same goal. A citation means that other people have read your work and found it important in developing their own work–“standing on the shoulders of giants.” It’s akin to a “assist” in basketball; we are just producing an endless stream of assists. So, citations do matter. They should be taken as a piece of information–among many, and not exclusively–in evaluating colleagues for hiring, tenure, and promotion. And don’t lie, sir, you check your own citation counts too. :-)
DonCoffin on 16 August 2015 at 16.25 EDT

I hesitate to mention this, but…the “h” index may (or may not) be a good measure of *research productivity* (or expected research productivity). But I have seen nothing that suggests any particular relationship between the “h” index and teaching quality. And, last time I checked, that was part of the job (at least it was part of mine before I retired).
Cullen Hendrix on 18 August 2015 at 10.20 EDT

Some reflections on this discussion:

1. I’m guessing one could sort us into two groups – those that view metrics as more useful than harmful and those that view them as more harmful than useful – with 90% accuracy with one simple question: “Do you do quantitative empirical work?”

2. If the dominant method of conferring status and stature in the discipline were pulling names from a large burlap sack, there would be some subset of us (the “chosen ones”) using considerable brainpower defending the wisdom of the sack. That is, we should expect that any evaluative criterion that has distributive consequences, real or implied, would have its defenders. This is as true of the caricature “metrics are all that matter” as it was/is of the old boy’s network, and true independent of any intellectual merit a given system might have.

3. No, academia shouldn’t be like baseball. But it also shouldn’t be like wine tasting or movie reviewing, where evaluations are notoriously idiosyncratic and unstable over time. In circumstances where evaluations are highly personal and taste matters, I’m usually partial to wisdom of crowds. This doesn’t mean an individual scholar should abdicate their responsibility to use their judgment, but it does mean individual scholars should be open to the idea they are missing something at least some of the time.

4. Yes, some subfields or issue areas within IR (or any discipline) will be more heavily traveled and highly cited than others. But that doesn’t mean the h-index or another metric is without value: it means we can decide that we value different approaches and subjects and then, as need be, compare apples to apples. Feminist IR is not generally as highly as quant conflict stuff. Fine. But when comparing feminist IR scholars to feminist IR scholars, you can’t tell me the person with the h-index of 20 and the person with the h-index of 5 have had similar impacts on the subfield.
- Dan Nexon on 20 August 2015 at 12.58 EDT
  
  The h-index is a good predictor of future Np and Nc performance (see Cullen’s post and the paper I link to). That’s not the same thing as saying that h-index measures impact.
Andy Hom on 20 August 2015 at 19.53 EDT

The sport-academia analogy has some very very rich company:

‘The workplace should be infused with transparency and precision about who is really achieving and who is not. Within Amazon, ideal employees are often described as “athletes” with endurance, speed (No. 8: “bias for action”), performance that can be measured and an ability to defy limits (No. 7: “think big”).’

https://www.nytimes.com/2015/08/16/technology/inside-amazon-wrestling-big-ideas-in-a-bruising-workplace.html?smid=tw-share&_r=0

Overall, the picture painted in the article of how Jeff Bezos’ ‘overarching confidence in metrics’ contributed to a particularly relentless and heartless work environment might just be the specter at our current feast of productivity metrics (some will say slippery slope–maybe, so long as we admit the slope).
Francis Baert on 28 August 2015 at 08.43 EDT

I like this discussion a lot, but I believe this is a discussion one cannot win or loose: most arguments posted here have some truth in it, but none is convincing to push forward GS or to ditch it. One can just put some nuance to people’s academic output. It is up to your peers to decide what is quality work with all the methodological problems (read in PTJ’s terms) that come with this. And we all know, this is very very difficult to judge, we always do it by proxy.

Aside from all the other arguments, I know the GS works quite well for article counts, but from the moment you go to grey literature like reports, working papers or policy papers, it does not work anymore.

With a closer look at the top of the field, it supports the idea that the most-cited authors are Ivy League professors (what a suprise), but gives some credit to “Europeans” like Archibugi, Zürn, Gleditsch and Gleditsch, Hurrell, Rittberger, Bierman, et. al. which score very high but are not the typical people to appear in American lists of who’s who in IR. And this becomes every more outspoken if you go to the list of scholars with GS scores between 2000 and 5000. In the same vein, but probably more problematic for some, is that some people with a Ph.D (I still have to earn mine) have a very low score and still have a fancy tenured place at a nice (US) university. Good for them, but maybe this indicates that sometimes search committees pick the wrong person?

So, search commitees or universities can also use GS as a headhunting tool as the unvisable become a bit more visible through systems like this. You need to use new tools to discover what you do not know, instead of searching for the confirmation of things you already know…so Cullen Hendrix’ post openend up the debate more in my reading than PTJ and DN reply (to close it?). I agree with their critiques but still, let’s try to use a tool to find things you cannot find in a normal paper CV.

In this light, I find this discussion of metrics realy interesting when one wants to dig deeper into a person’s CV. For instance some people decide to publish their work in non-indexed journals. Some of these, can get very big citations. For me, this becomes than a publication which has the same appeal as an indexed journal article.

Nonetheless, it would be great that all scholars would have a GS that is publicly accesable and using the same keywords/labels, like for instance ‘international relations’. https://scholar.google.be/citations?view_op=search_authors&hl=en&mauthors=label:international_relations

And that scholars clean up their GS when it includes publications that are not theirs, double counts, etc.

But again, very interesting discussion with many interesting thoughts in the comments’ section.

Trackbacks/Pingbacks

The collateral damage of performance metrics | Duck of Minerva - […] two recent posts, Cullen Hendrix, and Daniel Nexon and Patrick Thaddeus Jackson, have tabled important pros and cons of…
Around the Web Digest: Week of August 9 | Savage Minds - […] Finally, in navel-gazing news, Duck of Minerva features this post on our rising obsession with metrics like the h-index:…

Academia isn’t Baseball

Dan Nexon

share this post

Dan Nexon

43 Comments

Trackbacks/Pingbacks