This is a guest post from Daniel Mügge who is an associate professor of political science at the University of Amsterdam and the lead editor of the Review of International Political Economy.
In two recent posts, Cullen Hendrix, and Daniel Nexon and Patrick Thaddeus Jackson, have tabled important pros and cons of Google Scholar (GS) as a base for measuring of academic performance. And the flurry of reactions to their blogs reveal just how central and touchy and important an issue this is.
The debate so far concentrates on gauging the “quality” of an individual scholar, and how different approaches are fraught with biases. But when we weigh the merits and demerits of something like GS, there is another level that’s so far ignored: the collateral damage that managerial tools of quality control do to the academic enterprise as a whole.
The problem is simple: to boost the productivity of planet academia, all its inhabitants are surrounded by an intricate net of incentives. The reactions are predictable (as economics has taught us): academics respond to these incentives. As intended, an expected pay-off supplants intrinsic motivation as the fuel for our work. Before you know it, every tweet we send, every op-ed we pen, every seminar we organize, is just a means to some higher end – normally advancing our “careers”, meaning higher-ranking posts at higher-ranking universities. The academic ideals with which we entered the ivory tower – science for the sake of science, or for the sake of society – get crushed on the way.
The point is not that we should not care about the quality of each other’s work, or of those whom we hire. And I too remain suspicious of biases in personal, subjective judgments, whether they concern tenure committees or article reviews. But algorithms as an alternative invite opportunistic behavior because they don’t just measure behavior but also steer it – the performativity Nexon and Jackson discuss. They not only move the goal posts, they also alter them fundamentally. Why toil, research, write and publish? Simple. To boost your h-index.
Management of academics through targets and standardized metrics to boost competition goes back at least two decades. Stage one of the publication craze forced young academics to inflate their publication lists. The result? Overpriced edited volumes proliferated, never mind that they would languish unread in library stacks. Scholars recycled data and ideas endlessly, chopping them into the smallest publishable units. They created new journals not because of higher demand for articles, but because of skyrocketing supply of manuscripts. At the margin, the obsession with quantity also encouraged opportunistic co-authorship, with members of publication teams taking turns to “lead-author” the next article.
Academics’ responses to those incentive structures were as individually rational as they were collectively insane. The backlash against publication inflation ushered in stage two. Enter the impact factor (IF) . Henceforth, the quality of the publication outlet should be decisive when gauging a paper’s worth. The pitfalls of IFs have been debated endlessly. Suffice it to reiterate those that relate directly to perverse incentives, mainly for journal editors. Once its IF determines the volume and quality of submissions a journal receives, both success and failure are self-reinforcing. The journal is either on the way up or on the way down.
Journal editors thus have incentives to rig the game: prioritizing mediocre scholarship by big names, stuffing new articles with cites to the journal, adapting publication schedules to boost the IF, etc. It is unclear how often editors succumb to these temptations, but it is undeniable that they are there. (Disclaimer: at the Review of International Political Economy, which I co-edit, we have consistently shunned such tricks.) Before you know it, academia perceives journals through the prism of a single figure: the IF – never mind that that is often driven by a handful of articles. These trends threaten to sap vital energy from journals. Want to climb the rankings? Stop publishing what you think others should read; instead, publish what you know they will cite.
In our present-day stage three, the focus shifts back to individual scholars. Articles have Altmetric scores slapped on them, measuring for example how much Twitter-buzz your latest piece had triggered. Once again, the incentive is clear: to score highly, you have to build a faithful social media following. To please the metric, you start tweeting, build a profile on ResearchGate, and promote yourself on sundry other websites. Let’s be clear. There is nothing wrong with tweeting. But it is regrettable if scholars do it because they feel they have to, not because they find it useful for their work.
Academic management through incentives turns us into opportunistic costs-benefit analysts. People in the ivory tower tend to be smart: they are acutely aware of which kind of activity generates career pay-offs. Soon, everything has a merit-metric attached to it. Organize a faculty seminar? Three impact points. (Five if an Ivy League professor attends.) Op-ed? Four. More than 1k Twitter followers? Two.
We are caught in a cycle with no end in sight: new carrots and sticks are dangled in front of us, new forms of opportunism appear, criticism ensues, and the incentives are refined again, ushering in a new round. Rules for tenure and grant applications change from one year to the next as quality standards are rejigged continuously. No wonder burn-outs among young academics are on the rise.
It is easy to see where this will end. When you hire a new colleague, simply feed standardized CV’s into an assessment software, and your laptop will spit out your best hire. No need to worry that substance didn’t matter, because your own department will be assessed through a similar algorithm that – claiming objectivity – ignores substance as well. I appreciate that modern metrics may be less biased with respect to individuals than old routines, which relied on old boys networks, prestige, etc. But I still have a nagging feeling that, as we put academia under the yoke of scientific quality measurement, something crucial gets lost.
So what is to be done? As with any indicator, it’s important to grasp the limitations of what scientific metrics measure, and to take them with a grain of salt. But that in itself won’t stop their corrosive effects on academia as a whole because we face a collective action problem. When everyone else bases their judgments on these metrics, individual dissent quickly looks foolish. It is therefore up to those in the higher echelons of the academic hierarchy to turn the tide, because they can afford to ignore conventions and lead by example.
Journal editors have a responsibility to protect their professional verdicts and integrity against the lure of IF-boosting tricks. More generally, academics have to hone their personal, substantive judgment of their colleagues and their work. That takes time, and it takes active resistance against ubiquitous biases, for example based on gender or PhD granting institution. Scholars – young ones in particular – will only put their heart into their work if they know that that is what the rest of academia rewards.
I agree entirely and find this gamification of academia to be abominable. I despair that the values of thoughtful, critical, and committed scholarship-as-a-vocation have been lost amidst the measurements and motives of the market. As a grad student, I fully intend to change things once I’ve played the game long enough to reach a secure professional position, and once any grad students I eventually have all get jobs, since of course I’ll have a duty to help them play the game too…Yeah, once I’m emeritus, I’ll be the most vocal advocate for major reform away from metrics and structural incentives. Until then maybe I’ll just drink.
Not to mention those of us grad students who find the whole thing so unpalatable that even drinking can’t assuage the gaming abominable-ness and who will just join a circus… better job security.
This is going to come off as snarkier than I intend, but I worry a little about people having undue respect for current or past academic rules. Yes, hiring by algorithm would be nuts, but is it obviously worse than purely using old boys networks? Yes, impact factor gaming is bad, but again, is it worse than doing things off of prestige and social networks? Maybe my skepticism isn’t warranted, but I’m wary of arguments that pit new ways of doing things against the good old days when everyone pursued the life of the mind justly and nobly. I just have this nagging suspicion that the present and past are worse than people think—and any judgment about a future policy should take into consideration where we are now.
The ‘good old days’ stank too.
citation ratings can be stoked by good ole boy networks. If you are from a lab with a lot of grads, you pubish paper, each former graduate cites your paper X times, 10 former grads each cite each of your 10 papers once and and now you have an h = 10 before you even graduate. Add on the natural citations and you are easily into the 15-20 range and well above most people who have been in the field for 10 years or more.
Not to be an even snarkier spoil-sport, but regarding evaluation…
I suspect that, whenever n(systems)>=2, most folks will sort themselves according to which system they believe will be most personally advantageous — and these preferences will be relatively impervious to public discussion.
We have come to the end of Bildung:
“Pursued one-sidedly,” du Bois-Reymond wrote, “science confines our glance to the immediate, tangible, certain result. It turns the mind away from more general considerations and disaccustoms it to move in the realm of the quantitatively indeterminate.” In one respect, he added, “this is the invaluable advantage that we prize, but where science reigns exclusive, the mind grows poor in ideas, the imagination in images, the soul in sensitivity, and the result is a narrow, dry, and hard mode of thought, forsaken by the muses and the graces.” Evidence for this abounded: technology dominated research, politics undermined aspiration, celebrity ruined posterity, and business stifled literature. “In a word,” du Bois-Reymond announced, “idealism has succumbed in the struggle with realism, and the kingdom of material interests has come.” He dubbed the problem “Americanization.”
From my essay “Civilization and Science,” 1877, quoted by Gabriel Finkelstein, “Emil du Bois-Reymond” (Cambridge; London: The MIT Press, 2013), 223.
Somebody please tell me why the problem of performativity is unique to quantitative metrics and evaluations based thereon.
I don’t think that it is, I am not sure that anyone is saying that. The problem comes from reification or the confusion of those metrics with the goals of academia or whatever is being measured. If academia becomes the citation count olympics (not saying that is what you were advocating, but if hiring because so academia would be soon to follow) then is changes in ways that I think hurt the enterprise and allow for more corporate control.
TL/DR version: this problem exists with any metric.
I would say not at all unique, but maybe the problem is that quantitative metrics and ‘quality algorithms’ (to paraphrase OP lightly) *are* unique in that they’re so much more predictable when compared with the old ways of doing things, and therefore more easily gamed/performatized.
This is not to endorse pedigree, old boys network, and the like–only to say that I can learn how to boost my altmetric and GS citation count much more quickly and reliably than I can gain acceptance to an Ivy League program or the upper crust of influential IR scholars. And having successfully boosted such proxies, I could then endorse them in a way and to a level that would be blatantly careerist were it directed at actual coteries of scholars or institutions and sad and fawn-ish if they hadn’t endorsed me back. Altmetrics and citation counts will always endorse you back, it’s in their constitution. And this certainty is a problem in terms of performativity.
The older ways have plenty of problems, but if we’re comparing their insidiousness with the insidiousness of quantitative measures, then the contingency and randomness of such old ways–which is a huge problem on an individual level (e.g. supervisor favors student A over student B because student A adheres more closely to supervisor’s research agenda)–at least inhibits full or perfect performativity on a larger scale.
Although I appreciate the argument that quantifiable metrics can at times be abused, create perverse incentives, or lack context about a particular case, the argument that we should abandon such metrics falls flat. Departments can and do evaluate scholars for hiring decisions, tenure, promotion, raises, and so on. Yet in the “good old days” before h-indicies and the like, such decisions were made on gut feelings. People advanced because they had a “good” advisor, went a “top” school, or published in the “right” journals, leading to it’s own form of performativity. These decisions were extremely subjective, and often led to bitter disputes in departments about what constitutes quality. It is nearly impossible to have any agreement on what constitutes good scholarship, since we all have our own predilections and biases. But it is hard to argue with citation metrics. Arguably, THE defining feature of good work is that others use and engage with it in their own scholarship–even if to refute it–thus leading to higher citation counts. Clearly, citations aren’t the only factor that should be considered in evaluating the worth of a scholar, and Cullen Hendrix wasn’t arguing that. But, it is a useful piece of information to consider among many.