The following is a guest post by Dan Reiter, the Samuel Dobbs Candler Professor of Political Science at Emory University.
Dr. Cullen Hendrix’s recent Duck of Minerva post on citation counts sparked a vibrant discussion about the value of citation counts as measures of scholarly productivity and reputation.
Beyond the question of whether citation count data should be used, we should also ask, how are citation count data being used? We already know that, for better or worse, citation count data are used in many quantitative rankings, such as those produced by Academic Analytics and the National Research Council. Many journal rankings use citation count data.
We should also ask this question: How are departments using citation count data for promotion decisions, a topic of central interest for all scholars?
My Emory colleagues and I collected data on this question by fielding an email survey in Fall 2015, asking departments whether and how they use citation data in promotion decisions. Specifically, we issued 120 survey invitations to chairs of political science departments, as well as to a small number of chairs of social science departments and deans of public policy schools that contain political scientists. The intention was to survey all or nearly all departments in the United States for which a political scientist’s research record likely plays a major role in tenure and promotion decisions. The survey tended to exclude most liberal arts colleges, but did include liberal arts colleges that emphasize faculty research. The survey was short, only about ten questions, to reduce respondent fatigue and maximize response rate. A few weeks after the initial invitation was issued, a reminder email was sent to those individuals who had not yet responded.
We received 55 responses, a response rate of 46%. We received responses from 11 of the top 24 (as ranked by US News) political science departments (Emory is in the top 25, and we did not survey ourselves). Of those who did respond, 71% self-identified as being from public institutions, and 29% self-identified as being from private institutions.
The low, likely non-random response rate provides limits to the external validity of the survey sample. That said, the data do suggest three patterns.
First, there are very few formal guidelines for the use of citation count data. 85% of respondents indicated there were no department, college or university guidelines for or prohibitions on the use of citation data for tenure decisions. A similar percentage (87%) indicated a lack of guidelines or prohibitions for decisions on promotion to full professor.
Second, departments are using citation data. For tenure decisions, 62.9% of respondents indicated that in the last five years, their departments had used citations data half the time (18.5%), frequently (22.2%), or all/nearly all of the time (22.2%). Similarly, for promotion to full decisions, 68.5% indicated that in the last five years, their departments had used citations data half the time (16.7%), frequently (22.2%), or all/nearly all the time (29.6%).
Third, departments are giving citation count data only limited weight in their tenure and promotion decisions. We asked respondents to estimate the weight their departments had given citation count data in the last five years in tenure and promotion decisions, using a 0-100 scale, 0 indicating no weight at all, and 100 indicating that citation counts were the only factor affecting the department’s decisions. For tenure decisions, across all respondents, the average answer was 10.89, and only four respondents gave an answer higher than 15. For promotion to full decisions, the average answer was somewhat higher, 16.7, with 15 respondents giving an answer higher than 15.
We also asked the open-ended question, “Please describe how citation counts have been discussed in department tenure and promotion decisions, if at all.” These responses provided relatively little additional insight. A small number of responses indicated that Google Scholar (GS) citation data are used; no other source of citation data was mentioned. None of the open-ended responses described specific department, college, or university guidelines on the use of citation data.
These results are encouraging, in that departments are not placing excessive weight on citation counts as measures of faculty productivity, but they are worrisome in that departments and universities are apparently not using citation data with sufficient care. Departments and universities might consider the following three recommendations in how they use citation data, going forward.
First, craft formal guidelines about the use of citation data, in particular towards establishing what citation data bases should be used. The two main sources of citation data, GS and Web of Science, are vastly different, with different comparative advantages and disadvantages (notably, GS citation counts are routinely far larger than Web of Science citation counts). Though most political scientists seem to rely on GS, GS has some flaws and shortcomings. GS does not provide citation data for scholars who do not opt in to the system. A scholar can herself edit her GS citation count data, creating the possibility of strategic manipulation of these data. GS is a very inclusive data set, including a number of citations that might be viewed as “false positives,” such as listing scholar A’s citations under scholar B’s citation count because A and B have the same name, and including trivial citations such as citations on course syllabi that are posted on the internet. Institutions should consistently use a single citation data source to facilitate fair comparison of scholars across space and time, and should do so with full awareness of that data source’s flaws.
Second, institutions need to be cognizant of gender biases in citation count data. Articles in International Organization and International Studies Perspectives have shown that international relations articles authored by men are cited more frequently than articles authored by women. A 2015 sociology conference paper showed that across the subfields of political science, male authors tend to self-cite at a significantly higher rate than female authors, contributing to a gender gap in citation counts. Departments must keep the gender citation gap in mind to avoid incorporating gender bias into tenure and promotion decisions.
Third, inter-scholar comparisons should be made very carefully. Departments should avoid comparing scholars of different professional ages, as of course citation counts increase over time. In particular, junior faculty are likely to have very low citation counts because of the lag between when an article is published and when other articles get published that cite the article. Departments should also be wary of making comparisons of citation counts across subfields, because scholars in some subfields have much higher citation counts than scholars in other subfields. For example, Dr. Thomas Pangle is a 1972 Ph.D., and one of the field’s leading political theorists. As of December 2015, he had garnered 1718 GS citations. By comparison, according to data collected by Dr. Hendrix, the mean predicted number of GS citations (in 2015) for an IR scholar receiving his or her Ph.D. in 1990 (that is, an individual who is 18 professional years younger than Dr. Pangle) is 2686.
Citation counts are measures. As we do when using measures in our political science research, we should examine and use citation count measures with care.
Amanda Murdie is Professor & Dean Rusk Scholar of International Relations in the Department of International Affairs in the School of Public and International Affairs at the University of Georgia. She is the author of Help or Harm: The Human Security Effects of International NGOs (Stanford, 2014). Her main research interests include non-state actors, and human rights and human security.
When not blogging, Amanda enjoys hanging out with her two pre-teen daughters (as long as she can keep them away from their cell phones) and her fabulous significant other.
Great post Dan. I share your concerns about GS and would add to them somewhat — e.g., that GS seems to measure “impact” (popularity) more than “quality”. (For details: I just published about this concern at ISQ.) I am somewhat comforted by your finding that departments report that they don’t place much emphasis on GS metrics, but I wonder how much to believe them. Do you have a hunch about whether GS metrics might influence decisions subconsciously, and therefore have a bigger effect than is self-reported?