Measuring Journal (and Scholarly) Outcomes

19 June 2013, 1311 EDT

Another day, another piece chronicling problems with the metrics scholars use to assess quality. Colin Wight sends George Lozano’s “The Demise of the Impact Factor“:

Using a huge dataset of over 29 million papers and 800 million citations, we showed that from 1902 to 1990 the relationship between IF and paper citations had been getting stronger, but as predicted, since 1991 the opposite is true: the variance of papers’ citation rates around their respective journals’ IF [impact factor]  has been steadily increasing. Currently, the strength of the relationship between IF and paper citation rate is down to the levels last seen around 1970.

Furthermore, we found that until 1990, of all papers, the proportion of top (i.e., most cited) papers published in the top (i.e., highest IF) journals had been increasing. So, the top journals were becoming the exclusive depositories of the most cited research. However, since 1991 the pattern has been the exact opposite. Among top papers, the proportion NOT published in top journals was decreasing, but now it is increasing. Hence, the best (i.e., most cited) work now comes from increasingly diverse sources, irrespective of the journals’ IFs.

If the pattern continues, the usefulness of the IF will continue to decline, which will have profound implications for science and science publishing. For instance, in their effort to attract high-quality papers, journals might have to shift their attention away from their IFs and instead focus on other issues, such as increasing online availability, decreasing publication costs while improving post-acceptance production assistance, and ensuring a fast, fair and professional review process.

The story of the Impact Factor echoes those of many poorly designed metrics: it doesn’t seem to matter much how good the metric is, so long as it involves quantification. Institutions use impact factor as a proxy for journal quality in the context of library acquisitions, tenure and promotion, hiring, and even, as Lozano notes, salary bonuses.

At some institutions researchers receive a cash reward for publishing a paper in journals with a high IF, usually Nature and Science. These rewards can be significant, amounting to up to $3K USD inSouth Korea and up to $50K USD inChina. InPakistan a $20K reward is possible for cumulative yearly totals. In Europe andNorth America the relationship between publishing in high IF journals and financial rewards is not as explicitly defined, but it is still present. Job offers, research grants and career advancement are partially based on not only the number of publications, but on the perceived prestige of the respective journals, with journal “prestige” usually meaning IF.

For those readers who don’t know what “Impact Factor” is, here’s a quick explanation: “a measure reflecting the average number of citations to recent articles published in the journal.” We’ve discussed it at the Duck a number of times. Robert Kelley once asked, more or less, “what the heck is this thing and why do we care?” I passed along Stephen Curry’s discussion, which includes the notable line “If you use impact factors you are statistically illiterate.” I’ve also ruminated on how publishers, editors, and authors are seeking to exploit social-media to enhance citation counts for articles. In short, Impact Factor is a problematic measure even before we get into how easy it is to manipulate, i.e., by encouraging authors to cite within the journal of publication.*

In this sense, Impact Factor is similar to — and implicated in — many of the other proxies we use to assess quality in academia: publications in top peer-reviewed journals, rankings of academic presses, citations counts, and so forth. They’re all badly broken. And enough of us know that they’re badly broken such that we’re in the zone of organized hypocrisy. I am willing to sacrifice some degree of accuracy to reduce bias in the short-term, while pushing for change over the medium-term.

With respect to Impact Factor, there is no shortage of alternatives. We’re seeing a variety of different directions (via) to the evaluation of journal and scholarly impact. In terms of the former, the most conservative approach on the table is to adopt an h-index approach. The fact that Google Scholar provides h-5 index data for journals doesn’t hurt. For example, here are their current rankings in international relations:

IRh5gs

Readers will note that, in addition to all of the problems that plague Google Scholar’s database, the folks who run it should probably consult with more academics to construct their categories. The Journal of Conflict Resolution has an h5-index of 38, but doesn’t appear here.

My own view is that, if we’re going to rely on some kind of citation-count, we should be doing moderately more sophisticated things. I know that a number of my friends and I already use IF and h-index data as a rough yardstick for individual articles. For example, we will look at whether an article’s citation count is higher or lower than the relevant IF for the journal it appeared in.

In general, I think that this kind of comparative analysis makes sense. We should be assessing both articles and journals in terms of marginal value, e.g., asking what the “citation premium” of appearing in International Organization is versus various lower-ranked journals. While there are significant statistical challenges to this kind of analysis, they aren’t all that different from ones that we normally confront in our research. Many of the relevant control variables already exist in existing databases and via the alternative measurements I linked to earlier. Indeed, I suspect that this kind of work already exists — and if I had more time on my hands today, I already would have found it.

Thoughts?

—————-
*There is an alternative measure that removes journal self-citations, but nobody uses it. [back]