What is Coding, Anyway?

18 August 2009, 1239 EDT


One of my tasks since getting back from hiatus has been to wade through political science journals that piled up over the summer. In the April issue of PS: Political Science and Politics, I discovered this little gem: a one-page “article” entitled “Picturing Political Science” which consisted of the following:

“What do political scientists study? As part of a larger project, we coded every article in 25 leading journals between 2000 and 2007. We then created a word cloud of the 6,005 titles using https://www.wordle.net. The 150 most-used words appear in the word cloud. The size of each word is proportional to the number of times the word is mentioned. Draw your own conclusions.”

I really like the idea that a mainstream political science journal would legitimate a Wordle cloud as a genuine piece of scholarship. And in that vein, let me engage with the piece on its methodological and conceptual merit.

Leave aside the questionable link between the image and the title of the “piece.” (Are titles alone really the best indicator of the content of a work of scholarship? My recent experience with my book publisher suggests not.) What got me was the claim in the “abstract” of the piece that the authors “coded every article” in these journals. What in the world do they mean by “coding”? The authors do not tell us, and do not share any coded data with us. Did they simply mean they extracted the titles, pasted them into a word document, and plugged them into Wordle?

If so, that’s not coding, and claiming that it is only spreads confusion about the meaning of the term. (Which may be considerable. The Wikipedia page on “coding in the social sciences” is little more than a stub at present – consider this a call for concerned academics with more time on their hands than I to flesh it out with citations and nuance.)Broadly speaking, coding is the act of categorization for the purposes of analysis (it’s not the same as just counting frequencies of terms).

Researchers working with quantitative datasets “code” as they prepare datasets for statistical analysis, when they determine (for example) that conflicts with fewer than 1000 battle deaths constitute a 0 and those with 1000 or more constitute a 1 in a spreadsheet. A process of interpretation (not simply an automated frequency count of words) is involved in analytically converting historical records to numbers.

For those using qualitative methods and working with text rather than numbers, “coding” involves assigning categories of meaning to specific passages in text (interviews, focus group transcripts, blog posts, news articles, or something else). The method for so doing can be entirely interpretive, as when a graduate student goes through a stack of Security Council resolutions with different colored highlighters; or it can involve a more rigorous process where a detailed codebook is designed for use by independent coders to apply the annotations separate from the principal investigator, and where mathematical equations such as Cohen’s Kappa are used to measure the reliability of different annotations among coders. It can involve sorting documents into stacks on an office floor, or it can involve sophisticated and layered annotations on a text file using advanced qualitative data analysis software, documenting an analytical process whereby others might replicate one’s work.

So as we discuss what political scientists “do,” let’s just not cheapen the term “coding” by using it too loosely. And let’s not cheapen the significance of tools like Wordle in the profession by implying they do something they do not.