Open Data, Open Sesame?

1 September 2011, 0057 EDT

In an article last week in the Financial Times on “Sex, Lies, and the Pitfalls of Overblown Statistics,” John Kay bluntly wrote: “Always ask yourself the question: where does the data come from?”

It’s a good question, and one I frequently ask myself when I read yet another story about the hottest craze in the international development aid business today: the Open Data Initiative of the World Bank.

Don’t get me wrong. I think the World Bank’s Open Data initiative is freaking awesome. You can now get free access to the Bank’s World Development Indicators (hitherto accessible, but by no means free unless you were a privileged academic in a research university that purchased a subscription to the database). You can now get extensive information on the Bank’s financial activities, including lending data that previously you could only attempt to compile by patiently wading through turgid annual reports in PDF or hardcopy format. And now you can even get comprehensive information on the Bank’s project-level activities, including links to multiple project documents. The World Bank is really the only international aid agency that has achieved this kind of transparency (kudos!). The Bank even created a new staff position, with the title “Open Data Evangelist” (hello, Tariq!). In short, the Open Data Initiative is a goldmine for aid nerds like myself, and I am both impressed and grateful.

But is the World Bank’s Open Data Initiative really capable of “democratizing development” in the way envisioned by Bank President Robert Zoellick? Is open data going to be a panacea for the transparency and accountability deficits that have undermined international aid’s legitimacy and effectiveness for decades? Will open data really empower the poor?

It is entirely too easy to get swept up in the romantic swell of the Open Data movement. Yet the idealism underpinning Open Data leaves me uneasy. Why? Here are some of my half-baked musings on the subject:

1. Beware the Data Deluge. More is not always better. Opening the data spigots is great, but opening the data floodgates….?

2. Making data available to the public is not the same as helping the public to interpret – and interrogate – the data. I don’t mean to be patronizing here. But there is, to my mind, a tension between the desire to make data highly accessible to non-expert users and making data available in a way that is fully honest and transparent with respect to explaining where the data came from, how data are produced (and by whom), what margins of error exist, and how savvy users might replicate the data to ensure validity. After all, especially in the field of international development, the data environment can best be characterized as spotty, uncertain, unreliable and often nonexistent. But all these caveats and nuances often get lost what data are boiled down into easy-to-use dashboards and interfaces that spit out fancy graphs, geomaps, and so forth. Ooooooh…..data……. ahhhh…..

3. Cui bono? Who is using the data? So far, all we know is the the World Bank’s website is getting a gazillion hits per day. But who is accessing the data, what data are they accessing, and what are they doing with it? Numbers are powerful tools of persuasion. They can be distorted and misused, often to malevolent ends. (For interesting work on the politics of data, see Peter Andreas and Kelly Greenhill’s edited book, Sex, Drugs and Body Counts: The Politics of Numbers in Global Crime and Conflict).

4.Going beyond the use of data, what about the production of data? I fear that development data gurus sometimes confuse the democratization of the data end supply with the democratization of data production. I was socialized enough into the constructivist/critical theory mold during the bygone days of graduate school to realize that social science data does not appear out of thin air, nor can it be discovered in a test tube. In other words, somewhere along the way someone has to make pretty subjective calls about what numbers are important to collect, how to collect them, what to ignore (and how to control for that), and how to aggregate and interpret the results. A lot happens in the data production cycle that gets wistfully swept into some footnote or appendix that no one reads (if it appears in print at all). I guess my message here is that I hope the open data gurus start to turn a more critical eye towards questions regarding the data production and supply chain.

All said and done, I think the World Bank’s open data revolution is a huge step forward and should be picked up by other aid organizations (as well as what now seems to be a global open government movement). But in the process I also now look forward to a whole new debate about the politics of open data.