A couple of years ago, Human Rights Watch launched a report arguing for a treaty ban on fully autonomous weapons, claiming military robots who target and kill human beings would violate international law. Among other arguments, the report Losing Humanity cited the Marten’s Clause of the Hague Convention, which encourages states to consider the “dictates of the public conscience” in determining whether as-yet-ungoverned activities were morally permissible among civilized nations.

Intrigued by this advocacy claim, I conducted a survey in 2013 to test whether public revulsion at the idea of “killer robots” was as widespread as Human Rights Watch claimed. I found that indeed a majority of Americans – include a majority of military personnel – oppose the idea of such weapons. And those who were “not sure” how they felt also leaned in favor of a precautionary principle.

In Research and Politics this month, Michael Horowitz presents new survey data purporting to prove that public opinion is not quite so “against” the idea of fully autonomous weapons as it first appeared. In his paper, “Public Opinion and the Killer Robot Debate,” he replicates a version of my earlier question but also adds a couple of experimental conditions, prompting users to assume that US troops will be protected from attack, and to assume that autonomous weapons would more effective than other alternatives in such a scenario. After priming the respondents in this way, Horowitz reports that in fact, “rather than being widespread, public opposition to autonomous weapons is contextual, as with nuclear weapons”: that is, respondents who were “primed” were somewhat less averse to killer robots. From this rather unsurprising finding, he then infers that the “public conscience” is less opposed to autonomous weapons than believed and therefore the Martens clause argument may not apply: “It is too early,” Horowitz pronounces, “to argue that AWS violate the public conscience provision of the Martens clause because of public opposition.”

I’m delighted to see folks taking replication seriously, and building on my earlier and very preliminary study.  But in my view Horowitz’ finding does not support his wider claim, for three reasons. 

1) Horowitz’ experimental methods are inappropriate to measuring the public conscience. When you ask survey respondents leading questions, you can almost always get them to answer more or less how you want, regardless of what they actually believe. As Justin Lewis writes in his book Constructing Public Opinion: How Political Elites Do What They Like and Why We Seem to Go Along With It, surveys can “play a role in constructing an understanding of what people think and what they want.” Donald Kinder and Lynn Sanders showed in a 1990 paper entitled “Mimicking Political Debate With Survey Questions” that “Policy descriptions used within a survey can affect the expression of an opinion.”

That’s why a cardinal rule of thumb in eliciting people’s baseline opinion on something in an objective way is not to prime them at all. Indeed this is one of the first rules a survey consultant at a professional firm like YouGov will mention in a consultation – and did mention when I contacted them for help with my project. In my study, therefore, we did not coax people into replying one way or another but simply asked every respondent “how they felt” about autonomous weapons, and then invited them to tell us why in their own words.

Consider: if we had primed respondents to be afraid of autonomous weapons through “experimental scenarios” where the weapons went haywire, turned on their masters, or ended up slaughtering civilians indiscriminately, Horowitz would have been the first to cry foul. Instead, the best indicator of what the public thinks is what they say when they are not primed at all, and when survey questions are phrased in the most matter-of-fact terms rather than leading the respondents. Horowitz claims that it is problematic to ask people questions “in a vacuum.” Actually, the exact opposite is true: the context of survey answers should be inferred by respondents’ explanations of their answers, not built into the questions themselves.

2) Even if Horowitz’ experiment made sense, the study is biased in favor of killer robots. Beyond the baseline, Horowitz’ experimental conditions are drawn purely from pro-AWS hypotheticals. If one were going to do an objective “contextual” study, as Horowitz claims to do, on the conditions under which people change their support for killer robots from the baseline, one would want to include experiments on all the relevant scenarios we already know (from the open-ended questions in my earlier survey) cause such attitudinal change – like civilian deaths, or robots going haywire, or fears of a slippery slope toward greater likelihood of war. Instead, by conducting only experiments only on conditions such as “our soldiers’ lives are at stake” and “the weapons are more effective,” Horowitz encourages respondents to consider hypothetical pros of the weapons as if they are fact, while disregarding all the potential cons. But he does not do the reverse: prime pro-AWS respondents to consider the possible drawbacks of AWS. This is a classic example of a survey doing precisely what Lewis, Kinder and Sanders cautions against.

To see what a difference this makes, consider the issue of nuclear weapons, to which Horowitz compares his findings on autonomous weapons. Horowitz took for his inspiration a study on public opinion toward nuclear first-use by Daryl Press, Scott Sagan and Benjamin Valentino, and this study has recently been replicated with slightly different conditions by Brian Rathbun and Rachel Stein in a working paper presented at Yale University this month. Press, Sagan and Valentino’s original article, beloved of Horowitz, explored whether people were quite as horrified by nuclear weapons (as Nina Tannenwald has claimed) when presented with a scenario in which the national survival was at stake – a nuclear version of the “ticking time-bomb” scenario. Press, Sagan and Valentino found (predictably) that willingness to use nuclear weapons increased somewhat under such conditions. But when Rathbun and Stein replicated this study (adding in experimental questions about the harm caused to enemy civilians) they found that the willingness to break the nuclear non-use norm depended greatly on the number of civilians affected – something that Press, Sagan and Valentino didn’t even bother to measure.

How different would Horowitz’ findings have been had he included questions about unwanted side effects of autonomous weapons, or any moral concerns, in his experiment? As of now, we simply can’t know. Rather than taking place “in a vacuum” where respondents could be free to interpret the context according to their own concerns – precisely what is being measured – Horowitz primes his subjects to think purely in terms of utilitarian considerations and disregard morality altogether.

3) For that very reason, Horowitz’ study does not actually measure the public conscience at all. Rather it measures the conditions under which people will deviate from baseline moral concerns for pragmatic reasons, when invited to do so by an ‘expert.’ Given earlier studies on the social construction of public opinion through surveys, this finding is not at all surprising. But nor does it prove the absence of a normative taboo against a practice. When presented with a ticking-time-bomb scenario (and an assumption built into the survey question that torture is effective) some percentage of the US public will be likelier to state in a survey response that they would support torture. Yet as Paul Gronke, Darius Rejali and their collaborators show, even when presented with such scenarios favorable to torture support, and even though some respondents were swayed, a majority of Americans still opposed torture throughout the years of the Bush Administration. The public conscience remained a consistent bulwark against purely interest-based calculations, even if public opinion was swayed at the margins by utilitarian concerns. Indeed, that is the whole point of a “conscience” – it is not that pragmatic, interest-based-concerns don’t matter and don’t interrelate with morality. It is that morality can, should and often does pose a constraint against the untrammeled pursuit of those interests at all costs.

Indeed, in the end, Horowitz’ own data on autonomous weapons proves his assertion wrong. The fact that so many survey respondents (50%) remain strongly opposed to autonomous weapons even after Horowitz’ primed them to assume (perhaps wrongly) that the weapons are both vitally necessary and unswervingly effective only goes to show how reluctant Americans are to sanction the killing of human beings by machines. The best data in Horowitz’ study on popular attitudes toward AWS is not his experimental data but his baseline data, where people answered a question similar to that in my survey with no contextual priming. Even though he uses a somewhat biased convenience sample instead of a nationally representative sample, his baseline data confirms rather than refutes my original finding: more people oppose killer robots (48%) than support them (38%) and a significant number (15%) are unsure. In the absence of priming, the public remains more skeptical than enthusiastic about killer robots.

What Can Be Concluded?

Horowitz’ study does present two important points for those pondering the “dictates of the public conscience” as a basis for developing new international law. The first is that it’s important to examine why people think as they do, not just what they think. People may be for or against a policy for a combination of moral or ethical reasons, but the “dictates of the public conscience” standard in international law refers specifically to normative beliefs among “civilized” publics, not simply to what they might support for pragmatic reasons. For example, some proportion of pro-autonomous-weapons supporters actually believe that such weapons would better protect civilians than human soldiers can. This is a moral argument, not a pragmatic one about protecting our own national interests.

The right way to get at this “context,” however, is not through priming but rather through qualitative analysis of open-ended survey answers. By asking respondents not only how they feel but why, one allows the public to express for themselves precisely what, if anything, they find shocking or bothersome or humanitarian about specific behaviors in war. In fact that is precisely what my original study did. By collecting and coding open-ended answers, we discovered that the vast majority of those who support autonomous weapons do not do so in order to protect civilians or for any other humanitarian reason, but rather in order to protect our own military assets. By contrast, for those who feared autonomous weapons the reverse was true: they tended to cite moral and ethical principles, alongside some pragmatic dystopian fears. Public opinion may be split on autonomous weapons, but the public conscience clearly favors caution in developing such technology.

The second, more important lesson of Horowitz’ piece is one he himself acknowledges: that public opinion is malleable and subject to manipulation both by political elites and by survey theorists. Sarah Kreps has shown the same thing in her study of public opinion polls about drones. Such polls, she argues, have largely inflated public support for drones by building false assumptions into the questions (such as that the policy is legal and effective, both of which are actually key points of debate). Kreps’ study shows that when you ask the same questions without those assumptions, public support for drones drops. When the questions allow consideration of civilian casualties, they drop further. The point is – and Horowitz’ study shows – that we should be very skeptical of surveys claiming to measure public opinion as an intervention into war law debates.

But this does not mean, as Horowitz implies, dismissing the value of the public conscience as a check and balance against military technology run amok. Rather, it means we should hold such research to the highest standards of methodological non-bias, and separate wheat from chaff. We should look closely at any data and ask whether it measures what it says it is measuring. We should ask whether respondents are being primed in subtle ways to support the preferred positions of policy stakeholders. As scholars, we should aim to measure the salience of international norms among ordinary people by asking them neutral, open-ended questions instead of priming them with hypotheticals. And consumers of such studies should especially beware of “ticking-time-bomb” thought experiments designed to “prove” public support for highly contested policies.