Debating the Benefits of Nuclear Superiority, Part III

28 March 2013, 1116 EDT

Editor’s Note: Back in February I riffed on a post by Erik Voeten in which Erik discussed two articles in International Organization (IO). One, by our colleague Matt Kroenig, argued that nuclear superiority gives states advantages in crisis bargaining (PDF). Another, by Todd Sechser and Matthew Fuhrmann, rejects this claim (PDF).

After the two posts sparked some interesting discussion–both on- and offline–I approached all three about doing a mini-symposium at the Duck of Minerva. They agreed.  Kroenig kicked us off with objections to Sechser and Fuhrmann, and soon after we ran Sechser’s and Fuhrmann’s critique of Kroenig’s article. In this post, Sechser and Fuhrmann respond directly to Kroenig’s earlier post.

First, we want to thank Dan Nexon and the folks at the Duck of Minerva for the opportunity to participate in this important exchange.

The key question in our debate with Matthew Kroenig is whether nuclear weapons (or nuclear superiority) are credible and effective tools of coercion.  Nuclear weapons may be useful for deterrence, but can they also coerce?  Our theories reach opposite conclusions: we say no; Kroenig says yes.  Both sides marshal evidence to support their arguments.  So who is right?  Our goal in this post is to evaluate Kroenig’s empirical results and respond to his critique of our article.

We begin with Kroenig’s response. We appreciate his engagement with our research, but his harsh criticisms generate more heat than light.  Kroenig’s critique of our article boils down to two basic points:

1. The MCT dataset produces puzzling results, so it must be flawed.

Kroenig’s most forceful argument is that the dataset we use in our study – the Militarized Compellent Threats (MCT) dataset – produces “bizarre” results and therefore should be discarded.  Specifically, two factors turn out to be poor predictors of crisis outcomes in our analysis: conventional military power and crisis stakes.  Instead, intra-crisis signaling is much more important for explaining who wins and loses. Even though this is precisely what contemporary models of crisis bargaining have been telling us for years (see, for example, Morrow and Fearon), for Kroenig this result is so strange that it warrants abandoning the dataset altogether.

First, Kroenig neglects to mention that that his study finds exactly the same thing: namely, that conventional military capabilities and crisis stakes do not accurately predict who wins and loses, whereas intra-crisis behavior does. So his objection here is puzzling.

Second, scholars have known for decades that powerful states are not necessarily more successful – and may actually be less successful – in coercive diplomacy.  This is not difficult to understand: leaders of weak states have many incentives for resisting coercion by powerful challengers, such as reputation-building and domestic political gain. Our results are thus quite consistent with previous scholarship.

Kroenig likewise errs in complaining that “nothing is correlated with compellent success” in our results. In fact, as we have already mentioned, our findings show that signals of resolve clearly matter.  And other work using the MCT dataset points to the importance of alliances, conflict history, trade interdependence, geography, and even psychological factors.

Empirical findings should be judged by the quality of the data. If the data are appropriate to the question at hand, and if they were collected using valid procedures, then we can be confident in our findings.

Most importantly, however, Kroenig’s instincts here are badly misguided. Empirical findings should be judged by the quality of the data.  If the data are appropriate to the question at hand, and if they were collected using valid procedures, then we can be confident in our findings.  However, Kroenig suggests that social scientists have had it backwards all along: instead of using data to decide which theories to reject, Kroenig would have us use our theories to decide which data to reject – and any results that are uncomfortable or puzzling should be thrown out.

This argument is troubling – and wrong. New datasets often yield new findings, and sometimes those findings conflict with existing views. We judge data according to the procedures that governed how they were collected, not by whether we agree with the results. Kroenig’s only complaint about the MCT dataset, however, is that the findings do not confirm his preconceived notions.  This is not a valid reason for discarding data.

2. We have no theoretical explanation for our findings.

In fact, we make two theoretical arguments, which we summarized in our initial post. First, nuclear weapons aren’t useful for taking territory and other objects.  Thus, a coercive target doesn’t need to worry that the coercer will simply use its nuclear weapons to seize the item in dispute.  Second, and more importantly, executing a nuclear threat would entail enormous costs for the coercer – costs that are almost certainly not worth paying given the stakes of most coercive episodes.  This makes any coercive nuclear threat inherently dubious.

Kroenig disagrees, appealing to the authority of other scholars who, he claims, have “demonstrated” that these credibility problems can be overcome. This is inaccurate.  Many scholars have indeed speculated about how coercive nuclear threats might be made credible – but, beyond a few illustrations, they do not claim to demonstrate the empirical validity of their arguments.

Moreover, many of these studies (including Kroenig’s) suggest that nuclear weapons make coercion more effective even without the kind of explicit brinkmanship behavior discussed by Schelling and others.  Our study aimed to evaluate this claim – and we found it to be incorrect.

Nuclear Superiority and the Balance of Evidence

We now turn to Kroenig’s study.  In our previous post, we noted that Kroenig’s evidence was curious because it initially seems to contradict his own argument.  In his dataset of 20 nuclear crises, the nuclear-superior and nuclear-inferior sides prevailed at similar rates. In other words, nuclear superiority seems to offer little help in predicting crisis outcomes.

Yet Kroenig manages to extract from this data “strong and unambiguous” evidence that nuclear-superior states are more likely to win crises. How?  A closer look reveals the answer: a combination of inappropriate data and questionable methods.  We focus on four of the biggest problems below.

1. Observation inflation.

Kroenig confronts a basic challenge in his empirical analysis: nuclear crises are rare.  Specifically, he has only 20 nuclear crises in his dataset (drawn from the ICB dataset). Yet he winds up with 52 observations, enough to generate a statistically significant correlation.  How does he obtain such a large dataset from such a small set of crises?

The answer is that Kroenig simply duplicates each observation in the dataset, so as to double its size. A single observation for the Cuban Missile Crisis, for example, now becomes two independent events in his dataset: a victory for the United States, and a defeat for the Soviet Union.  This is inappropriate: the two observations are measuring the same event. Kroenig is not actually observing more data here; he is simply reporting the same event twice.  This is equivalent to an exit poll that lists each respondent twice in the sample – once voting for candidate X, and once voting against candidate Y – and then claims to have twice the sample size.

In fact, however, Kroenig’s dataset probably is still too small to sustain his conclusions.  Econometrics textbooks tell us that without roughly 100 observations, the kinds of models Kroenig uses are unstable and unreliable. With just 52 observations, Kroenig’s findings therefore remain suspect.

2. Questionable measurements of the nuclear balance.

A second problem is that Kroenig divides crises into multiple “dyads” (pairs of states), so his measure of the nuclear balance – his key explanatory variable – is strictly dyadic.  For observations containing the Soviet Union and Great Britain, for example, Kroenig calculates the dyadic nuclear balance using only Soviet and British nuclear weapons.  The problem is that since 1949 the United States has been bound by treaty to defend Britain in the event of war.  During the Cold War crises in Kroenig’s dataset, the United States explicitly brought its nuclear capabilities to bear on behalf of Britain and other allies. Cold War historians would be rather surprised to learn that the Soviets never considered U.S. capabilities when interacting with Britain – yet this is precisely what Kroenig assumes.

Cold War historians would be rather surprised to learn that the Soviets never considered U.S. capabilities when interacting with Britain – yet this is precisely what Kroenig assumes.

An example illustrates the problem.  In the 1961 Berlin Wall crisis, the Soviet Union prevailed over the United States and its allies.  Because the United States enjoyed nuclear superiority but lost, the case initially seems to contradict Kroenig’s theory.  But Kroenig breaks the crisis into separate dyads, thus creating four observations in which the Soviet Union is the nuclear-superior state (over France and Britain).  This is an inaccurate depiction of the nuclear balance, of course, because Britain and France relied on the American nuclear arsenal as well as their own.  Their defeat therefore should count as a failure for nuclear superiority, not a success. But Kroenig’s approach ignores this critical alliance relationship, thus transforming a case that refutes his theory into four cases that support it.

This has a significant impact on the overall findings: when we account for the impact of alliances on the nuclear balance, the effect of nuclear superiority disappears.

3. Data errors.

A third reason to doubt Kroenig’s findings is that his dataset contains dozens of errors.  In particular, a key variable in his dataset is badly garbled.  This variable is central to Kroenig’s statistical procedure, which involves “clustering” standard errors (we discuss the problem in more detail here). To be sure, Kroenig’s main findings survive when this problem is corrected, but their statistical strength declines.

4. Inappropriate data.

Even without these problems, it is not clear that Kroenig’s analysis would refute our argument.  The reason is that many of the cases in his dataset tell us nothing about nuclear coercion.

Consider, for example, the 1964 Congo crisis, which Kroenig counts as a “victory” for the United States.  In this crisis, the United States helped ferry Belgian paratroopers into Congo, where the Belgians rescued hostages held by Soviet-supported rebels.  The Soviets denounced the operation, and so Kroenig codes the United States as the “victor” over the Soviet Union – an apparent triumph for nuclear superiority.  But the United States made no coercive demands, and the Soviets made no concessions.  Calling this a victory for nuclear superiority is dubious, to say the least.

Kroenig argued in his earlier post that the ICB dataset is widely used and has proven its value over time.  We agree that the dataset is useful for many things.  But as scholars we must pay careful attention to what is contained in the datasets we employ. The ICB dataset may be widely used, but that does not make it appropriate for every research question.

Problems like this are an important reason we opted to use the Militarized Compellent Threats dataset.  Like any dataset, it has important limitations and is by no means perfect.  But one advantage is that, in contrast to ICB and other commonly-used conflict datasets, it contains only coercive threats. It thus paints a more reliable picture of the utility of nuclear coercion.

The stakes in the nuclear coercion debate are high, and passions can run hot. As social scientists, we therefore have an obligation to conduct research based on sound scientific principles, and to exercise care in the way we collect and interpret data. We have done our best to apply these principles here, but at the same time we are under no illusion that ours is the final word.  We look forward to watching the debate continue to evolve as other scholars bring new data and methods to bear on a question of vital scholarly and policy importance.