John Mearsheimer and Stephen Walt have written a piece that is critical of the supposed move to hypothesis testing and the failure of IR folks to do grand theory. I have many reactions to this development that I thought I would engage in a bit of listicle:
- My first reaction was: Next title: why too much research is bad for IR….
- As folks pointed out on twitter and on facebook discussions, it seems ironic at the least that someone who made a variety of testable predictions that did not come true (the rise of Germany after the end of the cold war, conventional deterrence, the irrelevance of international institutions, etc) would suggest that testing our hypotheses is over-rated or over-done.
- When I was preparing for my comprehensive exams long ago, I worked with a member of my cohort who was a Political Theorist just trying to get through the process. He would just read everything and ask “what would Ken Waltz think of this?” Well, invoking the WWWD mantra here, I think he might wonder why M&W are writing this stuff when they could be producing yet more Grand Theory or more Grand Theorists. Waltz produced Walt after all ….
- Which leads to the next question: what does this complaint say about their students? Either they failed their students (their students did not learn to do good grand theory) or the students have failed them (their students have focused on stuff other than grand theory). I know a good number of their students, and their work is often quite terrific and influential, so I am confused.
- If M&W have failed to re-generate themselves, it could be because they and their generation of grand theorists have answered all of the big questions, leaving us with the small questions and the dirty work of testing hypotheses. Perhaps they should be happy that their work is done and ride off into the sunset?
- Perhaps the utility M&W really seek to maximize is citations (given what they say at the end of the piece, I guess I am wrong here…). I became convinced in the early 1990s that producing controversial work seemed to be more important than producing convincing work. Mearsheimer’s piece blasting the “False Promise of Institutions” seemed to be citation-bait to me. Similarly, Walt’s article finding fault with the move towards formal theory seemed aimed not so much at convincing people but at attracting counter-attacks. [It is interesting that their latest piece cites approvingly the Fearon 1995 IO piece on Rationalist Explanations for War that Walt considered old wine in new bottles way back when].
What really frustrates me is that their claims make them bad realists and make me a Marxist. How so?
As realists, they think that power matters greatly in international relations–determining not just outcomes but interests. But M&W seem to ignore the role of power in the Political Science profession and especially in the IR economy. Who controls the commanding heights of the IR profession? Those who run the major journals. Those that serve as editors of series at the major presses. Those that run or influence the major fellowship-granting, post-doc giving institutions. Those that work at the most prestigious institutions and thus have access to the smartest students, to the largest endowments, to the media, and to the policy world. Mearsheimer and Walt seem to forget that they are among the most powerful and influential figures in our profession, yet they often feel so oppressed that they support the Perestroika movement and complain when institutions do not hire their students.*
The important thing to keep in mind is that M&W have a great deal of power in the profession, that folks have often feared their ire, and yet they feel as if the field is passing them by, focusing more on medium range theory (which can be tested) rather than grand theory (which can always be rationalized). In their article, they assert that theory is downgraded, and I would respond that grand theory may be a bit passé at the moment but there is heaps of theory out there. Because Realism is so indeterminate, because Liberalism is such a broad school (of which I think I am a member) and because constructivism is not really a paradigm with a shared core set of logics, we need more theory, not less, to develop clear expectations that can be subjected to tests. I think, ultimately, the complaint here is not about theory vs hypothesis testing but grand theory versus everything else (just like Walt’s complaint about formal theory conflated rational choice theory with formal modeling).
In terms of specific gripes about their piece,
- they avow that they are scientific realists (as opposed to instrumentalists) in terms of epistemology–that assumptions must be realistic, not useful fictions. Ok, fine, and rational actor assumptions for individuals can be considered to be realistic, but rational actor assumptions for states? Probably closer to useful fictions than something that can “be shown to be right or wrong.” So, I am confused.
- I am confused why they digress into the scientific realism vs instrumentalism epistemology discussion at all since it does not really connect directly to their complaint about theory vs. hypothesis testing. I can imagine a two by two where we have the four combinations of work: scientific realism and theory, SR and hypothesis testing, instrumentalism and theory, and instrumentalism and hypothesis testing and that each box is chock full of IR articles/books.
- Instrumentalists do not believe that process tracing is useful way to test theories? Oh, there is where this stuff matters. They are going to argue that process tracing is on the wane? I am really confused now because there is plenty of work that “tests hypotheses” via process tracing–do the casual processes avowed in the theory play out in reality as we trace the course of events? They cite the TRIP report stuff a lot in this piece (just as Satan can quote scripture), but not clear that the real numbers on the work being done today shows that process tracing is less in style than before.
- They go on to list the virtues of theory. Which is cool and useful and completely familiar to anyone who graduated a PhD program in the past forty years or so.
- Their discussion of “hypothesis testing” seems pretty insulting to me. That people have questions about a phenomenon, such as war, they identify variables that might be relevant, then they identify a dataset, and then there is testing….. I love that this starts with “At the risk of caricature.” It is a caricature. Yes, some people start with problems as opposed to starting with a grand theory they want to play with. But their choice of variables is not just reaching into a random bag of variables (well mostly), but considering the existing theory, extending/developing/inventing one’s own theory, and then indeed testing.
- This contrast they set up reminds of something that actually does exist or did way back when–that there were two schools of IR–coastal (Chicago is on a lake so it counts as coastal) and midwestern with the former focused on generating theory and the latter focused on testing theory. There was something to this distinction, but over the years, the coastal folks have gotten more interested in the construction of datasets to test their hypotheses (note that David Lake and the rest of the bandits at UCSD are on a coast) and the midwestern folks who used to reach into the Correlates of War dataset to assess whether there were correlates … of war are now doing some very interesting theoretical work that they then test.
- They assert that the hypothesis testers are not focused on the microfoundations of the work: “little intellectual effort is devoted to creating or refining theory; i.e., to identifying the microfoundations and causal logics that underpin the different hypotheses. Nor is much effort devoted to determining how different hypotheses relate to one another.” Really?
- I love the fact that they use Fearon to attack Huth and Russett when Fearon would appear to be just as guilty of being an instrumentalist hack at times (see his work on Insurgency and Civil War that has been most influential) under the M&W definitions of instrumentalist hackery.
- This article here is reminiscent of other stuff I have seen that attacks quantitative work but calls it something else. Yes, the data is not great, but there are problems with process tracing as well. Indeed, if we find correlations despite shaky data, it might mean the relationships are actually that much more convincing.
- Lack of cumulation as a problem for the hypothesis testers? It seems to me that the grand theory debates of the 1980s and 1990s had limited cumulation. After all, Mearsheimer was revising Realism back to Morgenthau with his focus on the quest for power rather than security. How is that cumulative?
- Citing the democratic peace stuff here is kind of funny since that finding led to a heap of theoretical competition–each new entrant into this debate was compelled by the existence of a core empirical finding to develop distinct theories and then test them in new ways. This was not hypothesis testing of adding just a few variables but really thinking about the causal connections between democracy and peace.
- They blame the expansion of IR PhD programs for this focus on hypothesis testing. Any institution can train students in methodology but so few can attract the brains who can think creatively and theoretically. It would be tempting to point out that one can see many amazingly sharp people who have creative juices aflowing who were so poorly trained that they could not articulate a research design, but I shall refrain.
- Not sure that this is true: “privileging hypothesis testing creates more demand for empirical work and thus for additional researchers.” This is so anti-Moneyball–that if we do go too far down a hypothesis testing path, wouldn’t the folks who are the grand theorists be that more rare and thus special and appreciated? I do think our market is a bit self-correcting (again, they make me a Marxist, damn it) as the fetish for formal modeling has worn off, that the fad in the most high tech methods has run its course, and now we have a fad of experiments.
- I love one of the very last lamentations in the piece: “Instead of relying on “old boy” networks, a professionalized field will use indicators of merit that appear to be impersonal and universal. In the academy, this tendency leads to the use of “objective” criteria—such as citation counts—when making hiring and promotion decisions.” Are they actually saying that old boy networks are better than trying to use more objective criteria? That citation counts are causing the move away from grand theory? Sure, the person who invents a great dataset gets heaps of citations, but so do the folks who come up with some great theory. Or at least folks who come up with theories/articles that people pay attention to (perhaps they should do some google scholar searches to see who is getting the most cites–the theorists or the folks they would accuse of being inductive hypothesis testers–but that would involve … dare I say it .. hypothesis testing). As someone who has lamented about being Rudolph, left out of the reindeer games, I can say this: the sooner we come up with objective criteria instead of relying on old boy** networks, the better.
- “What matters is one’s citation count, not helping outsiders understand important policy issues.” Um, how does grand theory help outsiders understand important policy issues better than middle range theory?
- “Academic disciplines are socially constructed and self-policing; if enough IR scholars thought the present approach was not working, they could reverse the present trajectory.” Ah, realism does not apply to the discipline because it has to do with persuasion and not power.
- “Emphasizing quality over quantity in a scholar’s portfolio might help.” They apparently do not follow Political Science Job Rumors which focuses oh so much on the importance of getting into APSR, IO, and the top presses. I wonder if you did some data collection and assessed who was getting tenure where, would it be about citation counts or hits in the top outlets or neither? If we just led the old boy network sort this out, I am sure things would be fine.
- Great conclusion: “The study of IR should be approached with humility.” Indeed. Perhaps this piece is not aimed at maxing citations but unintentional comedy?
Most folks are still doing their own thing–some quant, some qual, some realist, some not, some focused more on generating theory and some more focused on testing competing theories. Our discipline is actually a pretty big tent. As a result, one’s view of it depends on where one sits at the circus–people notices the folks who are unlike themselves with lots of confirmation bias affirming their sense of minority One can always feel like an outsider even when one is standing astride some of the most important institutions in the profession.
*This is not just a realist thing. I remember seeing Peter Katzenstein complain at an APSA or ISA about how oppressed constructivists were at a time where they were getting the best jobs, the best post-docs, getting their books and articles published in the best outlets.
** Oh, and, note while they probably meant nothing by the “boy” in “old boy”, one might ponder whether old boy networks might just be a wee bit sexist.
M&W obvs haven’t yet read the lead article in the Dec ’12 ISR, about which I have a post in preparation.
Or be because they wrote their paper before that was published. How current do you think research needs to be? It is almost impossible to respond to “they should have read this.”
Was a late-night comment, mostly tongue in cheek, anon.
P.s. Skimming through some of the M&W piece, I like the quote from Schrodt about regression coefficients bouncing around like a bunch of gerbils on methamphetamines. Nice image. However, I was skimming too quickly to pass judgment on the piece.
Re citation count, don’t forget that Walt and Mearsheimer have no idea what they are talking about. H-index is not a raw count of citations. It’s funny they explicitly defend the old boys networks though. Perhaps the students who sold their intellectual futures to Walt and Mearsheimer aren’t doing that well these days? https://www.tdaxp.com/archive/2013/01/06/science-paradigms-and-the-old-boy-network.html
I’m a little bit confused as to their emphasis on good theory being subject to falsification beginning on page 11. Doesn’t scientific realism dispense with the search for ‘black swan’ instances because disconfirming evidence can always be attributed to an unrecognized intervening variable in a causal mechanism? In other words, doesn’t scientific realism reject falsification and opt for a less confident position of fallibilism? Can either PTJ or Collin Wight clarify this for me?
Not much to clarify except that neither Walt nor Mearsheimer are scientific realists in any philosophically defensible sense of the term. They’re neopositivists plain and simple, which is why they valorize falsification — but they have bad conscience about it, which is why they lament the rise of precisely the kind of work that their own methodology implies is necessary. Actually becoming scientific realists would, in the first instance, require them to give up on falsification in favor of something more nuanced.
Can’t you be a scientific realist of the convergent realism variety- that which Laudan purports to have confuted – and still basically operate under a Neopositivist mindset?
Only at the expense of logical coherence.
Thanks PTJ. I feel like if I say ‘philosophy of science’ into a mirror three times you will appear and critique someone’s methodological surety.
I have left a more substantive comment on DrTdaxp’s post, but here will say that Tdaxp’s statements that the M&W piece is “absurd” and “ridiculous” are themselves absurd. Even a 5-minute scroll-through of the M&W piece suffices to establish that while it may be wrong or questionable on some points, it isn’t “ridiculous,” at least not to anyone who has a passing familiarity w the styles of research that appear in the IR journals. They are describing a particular kind of research that they don’t think much of. You can debate that but the debate is not advanced by labeling the piece “nonsense,” “absurd,” and “ridiculous,” as Tdaxp does.
In my view the absurdity of the piece is that here we have a pair of soft neopositivists complaining about the harder-edged, more philosophically coherent neopositivism that has bypassed them and their work. It’s the same syndrome as the whole “qualitative methods” thing in US PoliSci, which is basically a quixotic campaign to keep case-comparative work at the small-n level when there is precisely zero philosophical justification for doing so. Once you say that you’re interested in nomothetic generalizations, there is no alternative to using the largest n to test the most robust hypotheses available with the current level of technology. The alternative — to give up on nomothetic generalization in favor of something really different, like dispositional mechanisms or ideal-typical baselines — is not something that M&W consider, either here or elsewhere.
If I have time tomorrow I will post my own thoughts on the piece as a separate blog entry.
A great many people would like to read that post.
Will try in the morning.
It’s very late so I can’t write at *great* length (plus I still have only skimmed the M&W piece; I was busy composing a long post of my own today about something else — not yet posted).
That said, I don’t esp. like this formulation: “Once you say that you’re interested in nomothetic generalizations, there
is no alternative to using the largest N to test the most robust
hypotheses available with the current level of technology.” I think there is no reason someone can’t do small-N work w/ the aim of generalizing — as long as sufficient humility-caveats are put on the generalizations. Not everyone knows exactly what a dispositional mechanism is, but everyone knows what a generalization is, and ‘nomethetic generalization’ is actually not that useful or revealing a phrase since *all* generalizations are by definition something other than idiographic.
What you are basically saying here, ISTM, is that no one is allowed to do small-N work unless he or she wraps it up in a whole lot of verbiage about context-dependent causal (or dispositional) mechanisms or ideal-typical baselines, etc. And application of this criterion would end up throwing some decent work into the trash. Imagine two works with several empirical chapters and a theoretical one preceding them: if the empirical chapters are pretty much the same in both cases, how much difference does it make whether the theoretical preface talks about generalizations, or mechanisms, or ideal-typifications? Oh sure, it makes a *lot* of difference to the person writing it (as a dissertation, say). But does it make all that much difference to the reader, presuming the thing is eventually published? I sometimes wonder…
[ok, now I’m going to run for cover]
I stand by what I said, and have said on many occasions: “qualitative” methods are a lifestyle choice, not a defensible strategy of inquiry. To be interested in producing nomothetic generalizations and then to resist ways of making those generalizations more robust is either perversely contrarian or nakedly technophobic.
“Imagine two works with several empirical chapters and a theoretical one preceding them: if the empirical chapters are pretty much the same in both cases, how much difference does it make whether the theoretical preface talks about generalizations, or mechanisms, or ideal-typifications?” All the difference in the world, because if the author means what they say in the theoretical chapter the empirical chapters *can’t* be “pretty much the same.”
Everyone does not know what a generalization is, as evidence by the fact that some people seem to believe that there is a meaningful methodological difference between a small-n and a large-n generalization. That’s precisely my point: there’s not, so if you have the means to do a large-n generalization, why stop with a small-n generalization?
I’m sure it will come as a shock to everyone that my reaction (expressed in an email to Steve) on this point was more or less identical to PTJ’s. You can’t advocate quasi-statistical standards of theoretical adjudication and then complain when the statisticians implement those standards better than you do.
That being said, there’s a much more charitable reading of the paper: one in which “grand theory” isn’t a cover for “not expressed in the language of math” and the complaint really is with largely a-theoretic inductive work. But I think that argument needs to be made from a different standpoint and, as I’ve said before, the problem isn’t the lack of theory, but poor theorization. And one of the solutions to poor theorization is greater tolerance of descriptive work. See: https://www.whiteoliphaunt.com/duckofminerva/2012/03/professionalization-and-poverty-of-ir.html
Anyway, I’m thinking it might be getting time to gussy up the PTJ/Nexon paper as a DoM “working paper” document. That way, if it is rejected, it is at least out there.
Well, I agree with some of the above, and I think I should refrain from further substantive comment on the M&W paper and the methodological etc. issues it raises until I have properly read the paper. That would seem prudent, to say the least.
I would note however for the record that my initial objection was to a blog post by Tdaxp (who commented in this thread) referring to the paper as “nonsensical” and “ridiculous,” which seemed a little over the top. Similarly the commenter Ralph’s remark in this thread (above) that M&W are “dying dinosaurs trying desperately to cling to influence and power that has long escaped them” seems excessive.
Certainly there’s plenty to criticize M&W for. I think — to pick up on something you (PTJ) said above in response to Eric Van R. — M&W don’t do nuance esp. well. And M’s ‘Tragedy’ is a *very* flawed book, imo. (On the other hand, Walt’s ‘Taming American Power’, for what it is, is not bad.)
Hey LFC,
Did you have any substantive criticisms of my post (other than my use of “paradigm,” which I stand behind)? I replied to your comment at tdaxp, but it seems you’ve just taken to repeating yourself here without engaging in conversation.
“seems you’ve just taken to repeating yourself here without engaging in conversation”
Sorry if I gave that impression. In terms of substance, I also disagreed with your statement that IR is scientific to the extent that it can predict, control, and improve state behavior. Bit too instrumental a definition of ‘science’ for my taste.
Of course, I agree that saying a field is scientific to which it achieves its end results puts the cart before the horse. I agree with Walt & Mearsheimer that a sensible goal for IR is to inform policy makers, in other words to assist in controlling and improve outcomes. Prediction is likewise required, as otherwise we’re in just-so story land.
It is unclear to me why prediction and just-so stories are opposites. The opposite of a just-so story is an explanation (since that gives reasons why instead of just describing what), but explanations are only equivalent to prediction if you’re a neopositivist.
PTJ,
“It is unclear to me why prediction and just-so stories are opposites.”
One is a testable component of science.
The other is sophistry.
Whether or not this makes this “opposites” is up to you to decide. I’m not sure I care.
“The opposite of a just-so story is an explanation (since that gives reasons why instead of just describing what), ”
An odd definition of opposite (perhaps you mean “complement”?), but that’s neither here nor there.
“but explanations are only equivalent to prediction if you’re a neopositivist.”
Why would you think such a thing?
All that PTJ means is that a prediction is a claim that “X will happen [perhaps conditional upon Y]” while a just-so story is a ‘bad’ account of why we should accept the prediction or why it was correct. The idea that prediction & explanation are the same thing only exists among some philosophers of science. PTJ doesn’t think they are.
This shouldn’t be a big deal, and I suspect it is in danger of hijacking the underlying discussion.
“All that PTJ means is that a prediction is a claim that “X will happen [perhaps conditional upon Y]” while a just-so story is a ‘bad’ account of why we should accept the prediction or why it was correct”
I agree with this. The point that confused me was the claim
””but explanations are only equivalent to prediction if you’re a neopositivist.””
which doesn’t flow from that.
A serious question (not rhetorical): Have you guys encountered model fit as a quantitative measure of correctness separate from measures of % of correlation explained (eg r2 and its variations). (If not the statement that explanations are equivalent to prediction if you’re a neopositivist would seem to make sense.) Or am I misunderstanding what a ‘neopositivist’ is? (another serious q.)
You may be misunderstanding what a neopositivist is because I’m telegraphing answers instead of taking the time to really spell things out. Briefly: neopositvism is a particular account of science and scientific knowledge that places a premium on falsification as the predominant, and perhaps the exclusive, standard by which a claim ought to be evaluated: good claims fail to be falsified, bad claims are falsified. There is a lot of underlying philosophical-ontological baggage here about the mind-independence of the world and the limitation of knowledge to the in-principle experiencable, and it is only in the light of that baggage and those presumptions that the whole hypothesis-testing procedure makes sense. Likewise, the logical equivalence of explanation and prediction depends on this same perspective, because only if a valid explanation is an empirically general claim that fails to be falsified does it look like prediction at all. Explanations that don’t look like that — explanations that aren’t neopositivist, but are realist or analyticist or critical — have zippo implications for prediction, and indeed most work in those other traditions eschews prediction as something of a pipe dream, at least in the open system of the actual world outside the lab.
All r2 and other goodness-of-fit measures actually capture is the extent to which the model captures the variation in the data. Philosophically, that’s continuous with the coefficients in the equation measuring the degree of correlation, so I don’t see that this makes any difference. “How much does my model mirror the world?” is the basic neopositivist question, and the only way to cash that out is to gather a lot of data and see whether the model fits it in general, and how much. My point — the broader point that Dan was worried might get lost in the discussion — is that “science” and “neopositivism” aren’t equivalent, and not all scientific claims take the form of testable hypotheses.
this post (lots of little quibbles, no big idea) = proof of grand theory’s demise
Great post. One thing to keep in mind is that the fixation with, and influence of, M&W diminishes greatly outside the borders of the US. This is an international field and most of the discipline recognizes these two as dying dinosaurs trying desperately to cling to influence and power that has long escaped them. Students are not that interested in their work, most scholars find that their work doesn’t help them answer any of the big questions they focus on, and their ‘boys network’ is antiquated at best, and an embarrassing, sexist relic of days LONG gone by at worst. This latest piece should be seen as a rather pathetic and disconnected swan song rather than something worth engaging with on any serious level.