Archive for February, 2012

Barbara Fister’s review

Barbara Fister at Inside Higher Ed has a thoughtful review up.


My article in New Scientist on messiness

The February issue of New Scientist is running an op-ed I wrote for them about why messiness in knowledge is a good thing. Messiness scales.

Here’s the link to the page…that will let you pay to read the article.


[2b2k] Moynihan: On the other hand…

My friend Daniel Sheerin in the State Department’s eDiplomacy group (where I sadly recently ended my second and final year as a Franklin Fellow — what a great group!) sent me a quotation from Sen. Daniel Patrick Moynihan to balance the quotation I use in Too Big to Know and just about whenever I talk about knowledge.

The quote I’ve been using is: “Everyone is entitled to his own opinion, not his own facts.” I like it first because it’s put so well, but mainly because it expresses a promise that knowledge has made to us in the West: If we can just sit down and look at the facts, all reasonable people will agree. I think the Net is showing us that that’s not a promise that can be kept.

But, of course that’s not the only thing Moynihan said on the topic. Dan points to the this from the Senator: “I fear that rationality is but a weak foil to the irrational. In the end we shall need character as well as conviction.”

Much better! But, I’m not convinced that character + conviction is going to win the day, and since Moynihan was in politics, I suspect that he agreed. Today the formula is probably more like: Character + Conviction + $5,000,000 ad budget.

BTW, I’ve always liked Bertrand Russell’s remark (approximately): “One cannot be argued out of a position that one was not argued into.”

And also BTW, Dan highly recommends Moynihan’s letters.


[berkman] From Freedom of Information to Open data … for open accountability

Filipe L. Heusser [pdf] is giving a Berkman lunchtime talk called “Open Data for Open Accountability.”

NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.

How is the open Web been changing accountability and transparency? Filipe is going to share two ideas: 1. The Web is making the Freedom of Information Act (FOIOA) obsolete. 2. An open data policy is necessary to keep freedom of information up to date, and to move toward open accountability.

Lots of people praise transparency, he says. There are multiple systems that benefit from it. Felipe shows a map of the world that shows that most parts of the world have open government policies, although that doesn’t always correlate with actual openness. We continue to push for transparency. One of the cornerstones of transparency policy is freedom of information regulation. In fact, FOIA is part of a long story, going back at least back to 1667 when a Finnish priest introduced a bill into the Swedish parliament. [Entirely possible I heard this wrong.]

Modern FOI laws require governments to react to requests and to proactively provide information. (In response to a question, Filipe says that countries have different reasons for putting FOI laws in place: as a credential, to create a centralized info system (as in China), etc.), etc. Felipe’s study of 67 laws found five clusters, although overall they’re alike. One feature they share: They heavily rely on reactive transparency. This happens in part because FOI laws come out of an era when we thought about access to documents, not about access to data. That’s one big reason FOI laws are increasingly obsolete. In 2012, most of the info is not in docs, but is in data sets.

Another reason: It’s one-way information. There’s no two-way communication, and no sharing. Also, gatekeepers decide what you can know. If you disagree, you can go to court, which is expensive and slow.

In May 2009, launched. The US was the first country to support an open data policy. Sept. 2009 the UK site launched. Now many have, e.g., Kenya and the World Bank. These data are released in machine-readable formats. The open data community thinks this data should be available raw, online, complete, timely, accessible, machine processable, reusable, non-discriminatory and with open licenses.

So, why are these open data initiatives good news? For one thing, it keeps our right to FOI up to date: we can get at the data sets of neutral facts. For another, it enables multiway communication. There are fewer gatekeepers you have to ask permission of. It encourages cheap apps. Startups and NGOs are using it to provide public service delivery.

Finally, Felipe runs an NGO that uses information to promote transparency and accountability. He says that access to open data changes the rules of accountability, and improves them. Traditional gov’t accountability moves from instituational and informal to crowd-source and informal; from a scarcity of watchdogs to an abundance of watchdogs; and from an election every four years to a continuous benchmark. We are moving from accountability to open accountability.

Global Voices started a project called technology for transparency, mapping open govt apps. Also, MySociety, Ushahidi, Sunlight Foundation, andCuidadano Inteligente (Felipe’s NGO). One of CI’s recent apps is Inspector of Interests, which tries to identify potential conflicts of interest in the Chilean Congress. It relies on open data. The officials are required to release info about themselves, which CI built an alternative data set to contrast with the official one, using open data from the Tax and Rev service and the public register. This exposed the fact that nearly half of the officials were not publishing all their assets.

It is an example of open accountability: uses open data, machine readable, neutral data, the crowd helps, and provides ongoing accountability.

Now Felipe points to evidence about what’s going on with open data initiatives. There is a weird coalition pushing for open data policies. Gov’ts have been reacting. In three years, there are 118 open data catalogs from different countries, with over 700,000 data sets. But, although there’s a lot of hype, there’s lots to be done. Most of the catalogs are not driven at a national level. Most are local. Most of the data in the data catalogs isn’t very interesting or useful. Most are images. Very little info about medical, and the lowest category is banking and finance.

Q: [doc] Are you familiar with miData in the UK that makes personal data available? Might this be a model for gov’t.

Q: [jennifer] 1. There are no neutral facts. Data sets are designed and structured. 2. There are still gatekeepers. They act proactively, not reactively. E.g., has no guidelines for what should be supplied. FOIA meets demands. Open data is supplied according to what the gatekeepers want to share. 3. FOIA can be shared. 4. What’s the incentive to get useful open data out?
Q: [yochai] Is open data doing the job we want? Traffic and weather data is great, but the data we care about — are banks violating privacy, are we being spied on? — don’t come from open data but from FOIA requests.
A: (1) Yes, but FOI laws regulate the ability to access documents which are themselves a manipulation to create a report. By “neutral facts” I meant the data, although the creation of columns and files is not neutral. Current FOI laws don’t let you access that data in most countries. (2) Yes, there will still be gatekeepers, but they have less power. For one thing, they can’t foresee what might be derived from cross-referencing data sets.
Q: [jennifer] Open data doesn’t respond to a demand. FOI does.
A: FOI remains demand driven. And it may be that open data is creating new demand.

Q: [sascha] You’re getting pushback because you’re framing open data as the new FOI. But the state is not going to push into the open data sets the stuff that matters. Maybe you want to say that WikiLeaks is the new FOI, and open data is something new.
A: Yes, I don’t think open data replaces FOI. Open data is a complement. In most countries, you can’t get at data sets by filing a FOI request.

Q: [yochai] The political and emotional energy is being poured into open data. If an administration puts millions of bits of irrelevant data onto but brings more whistleblower suits than ever before,…to hold up that administration as the model of transparency is a real problem. It’d be more useful to make the FOI process more transparent and shareable. If you think the core is to make the govt reveal things it doesn’t want to do, then those are the interesting interventions, and open data is a really interesting complement. If you think that you can’t hide once the data is out there, then open data is the big thing. We need to focus our political energy on strengthening FOI. Your presentation represents the zeitgeist around open data, and that deserves thinking.
Q: [micah] Felipe is actually quite critical of I don’t know of anyone in the transparency movement who’s holding up the Obama gov’t as a positive model.
A: Our NGO built Access Inteligente which is like WhatDoTheyKnow. It publishes all the questions and responses to FOI requests, crowdsourcing knowledge about these requests. was the first one and was the model for others. But you’re right that there are core issues on the table. But there might be other, smaller, non-provocative actions, like the release of inoffensive data that lets us see that members of Congress have conflicts of interest. It is a new door of opportunity to help us move forward.

A: [juan carlos] Where are corporations in this mix? Are they not subject to social scrutiny?

Q: [micah] Can average citizens work with this data? Where are the intermediaries coming from?
A: Often the data are complex. The press often act as intermediaries.

Q: Instead of asking for an overflow of undifferentiated data, could we push for FOI to allow citizens’ demands for data, e.g., for info about banks?
A: We should push for more reactive transparency

Q: [me] But this suggests a reframing: FOI should be changed to enable citizens to demand access to open data sets.

Q: We want different types of data. We want open data in part to see how the govt as a machine operates. We need both. There are different motivations.

Q: I work at the community level. We assume that the intermediaries are going to be neutral bodies. But NGOs are not neutral. Also, anyone have examples of citizens being consulted about what types of data should be released to open data portals?
A: The Kenya open data platform is there but many Kenyans don’t know what to do with it. And local governments may not release info because they don’t trust what the intermediaries will do it.


[2b2k] Book talk in Brookline/Boston

I’m giving my hometown book talk about Too Big to Know tomorrow, Monday, at the estimable Brookline Booksmith at 7pm. It’s free! Come!


[2b2k] Moi moi moi

Steve Cottle has done a great job live-blogging my wrap-up talk at the Tech@State event. Thanks, Steve!

I was the guest on Tummelvision a couple of nights ago, which is podcast tumble-tumult of persons and ideas. It doesn’t get much more fun than that. Thanks, Heather, Kevin, and Deb!

The Berkman Center has posted the video of my book talk. Look on the bottom left to find the player and the links.

KMWorld’s Hugh McKellar has posted his interview with me.

And NYTECH has just posted a video of my talk there on Jan 25. The talk is about 45 mins and then there’s a lively Q&A. Thanks NY TECH!

Brandeins has posted an interview with Doc Searls and me about Cluetrain. (They translated it into German.)


[2b2k] The corruption of impact

According to a survey publishsed in Science [abstract][Slashdot] scientists are routinely pressured to include superfluous references in their papers in order to boost the Impact Factor of the journal publishing their paper. The Impact Factor is (roughly) a measure of the importance/influence of a journal, based on a two year average of how often its papers are cited. Careeers are made by publishing in high Impact Factor journals.

This sort of corruption (which I talk about a bit in Too Big to Know) might seem like an inevitable imprecision in how we gauge something as vague as “infuence” if alternatives were not becoming available. Services like Mendeley can provide real-time readouts of which articles are being read and commented on. Google likewise can see how often articles are being linked to. Facebook can see how articles are being passed around social networks, some of which are quite expert. It would of course be good to have measures not gated by commercial entities. In any case, institutions of knowledge are currently relying upon an instrument that was always too blunt and now known to be corrupt.


[tech@state][2b2k] Real-time awareness

At the Tech@State conf, a panel is starting up. Participants: Linton Wells (National Defense U), Robert Bectel (CTO, Office of Energy Efficiency), Robert Kirkpatrick (Dir., UN Global Pulse), Ahmed Al Omran (NPR and Suadi blogger), and Clark Freifield (

NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.

Robert Bectel brought in [I use NetVibes as my morning newspaper.] to bring real-time into to his group’s desktop. It’s customized to who they are and what they do. They use Netvibes as their portal. They bring in streaming content, including YouTube and Twitter. What happens when people get too much info? So, they’re building analytics so people get info summarized in bar charts, etc. Even video analytics, analyzing video content. They asked what people wanted and built a food cart tracker. Or the shuttle bus. Widgets bring functionality within the window. They’re working on single sign-on. There’s some gamification. They plan on adding doc mgt, SharePpoint access, links to Federal Social Network.

Even better, he says, is that the public now can get access to the “wicked science” the DOE does. Make the data available. Go to IMBY, put in your zip code, and it will tell you what your solar resource potential is and the tax breaks you’ll get. “We’re going to put that in your phone. “We’re creating leads for solar installers.” And geothermal heat pumps.

Robert Kirkpatrick works in the UN Sect’y Gen’ls office, called Global Pulse, which is an R&D lab trying to learn to take advantage of Big Data to improve human welfare. Now “We’re swimming in an ocean of real time data.” This data is generated passively and acively. If you look at what people say to one another and what people actually do, “we have the opportunity to look at these as sensor networks.” Businesses have been doing this for a long time. Can we begin to look at the patterns of data when people lose their job, get sick, pull their kids out of school to make ends meet? What patterns appear when our programs are working? Global pulse is working with the private sector as well. Robert hopes that big data and real-time awareness will enable them to move from waterfall development (staged, slow) to agile (interative, fast).

Ahmed Al Omram says last year was a moment he and so many in the Middle East had been hoping for. He started blogging (SaudiJeans) seven years, even though the gov’t tried to silence him. “I wasn’t afraid because I knew I wasn’t alone.” He was part of a network of activists. Arab Spring did not happen overnight. “Activists and bloggers had been working together for ten years to make it happen.” “There’s no question in my mind that the Internet and social media played a huge role in what happened.” But there is much debate. E.g., Malcolm Gladwell argued that these revolutions would have happened anyway. But no one debates whether the Net changed how journalists covered the story. E.g., Andy Carvin live-tweeted the revolutions (aggregating and disseminating). Others, too. On Feb. 2 2010, Andy tweeted 1,400 times over 20 hours.

So, do we call this journalism? Probably. It’s a real-time news gathering operation happening in an open source newsroom. “The people who follow us are not our audience. They are part of an open newsroom. They are potential sources and fact-checkers.” E.g., the media carried a story during the war in Libya that the Libyan forces were using Israeli weapons. Andy and his followers debunked that in real time.

There is still a lot of work to do, he says.

Clark Friefield is a cofounder of healthmap, doing real time infectious disease tracking. He shows a chart of the stock price of a Chinese pharma that makes a product that’s believed to have antiviral properties. In Jan 2003, there was an uptick because of the beginning of SARS, which as not identified until Feb 2003. In traditional public health reporting, there’s a hierarchy. In the new model, the connections are much flatter. And there are many more sources of info, from tweets that are fast but tend to have more noise, and slower but more validated sources.

To organize the info better, in 2006 they reated a real-time mapping dashboard (free and open to the public). They collect 2000 reports a day, geotagged to 10,000 locations. They use named entity extractin to find disesases and locations. A bayesian filtering system are categorized with 91% accuracy. They assign significance to each event. The ones that make it through this filter make it to the map. Humans help to train the system.

During the H1N1 outbreak, they decided to create participatory epidemiology. They launched an iphone app called “Outbreaks Near Me” which let people submit reports as well as get alerts, which beame the #1 health and fitness app. They found that the rate of submissions tracked well with the CDC’s info. Also

Linton Wells now moderates a discussion:

Robert Bectel: DOE is getting a serious fire hose of info from the grid, and they don’t yet know what to do with it. So they’re thinking about releasing the 89B data points and asking the public what they want to do with it.

Robert Kirkpatrick: You need the wisdom of crowds, the instinct of experts, and the power of algorithms [quoting someone I missed]. And this flood of info is no longer a one-way stream; it’s interactive.

Ahmed: It helps to have people who speak the language and know the culture. But tech counts too: How about a twitter client that can detect tweets coming from a particular location. It’s a combo of both.

Clark: We use this combined approach. One initiative we’re working on builds on our smartphone app by letting us push questions out to people in a location where we have a suspicion that something is happening.

Linton: Security and verification?

Robert K: Info can be exploited, so this year we’re bringing together advisers on privacy and security.

Ahmed: People ask how you can trust random people to tell the truth, but many of them are well known to us. We use standard tools of trust, and we’ll also see who they’re following on Twitter, who’s following them, etc. It’s real-time verification.

Clark: In public health, the ability to get info is much better with an open Net than the old hierarchical flow of info.

Q: Are people trying to game the system?
A: Ahmed: Sure. GayGirlInDamascus turned out to be a guy in Moscow. But using the very same tools we managed to figure out who he was. But gov’ts will always try to push back. The gov’ts in Syria and Bahrain hired people to go online to change the narrative and discredit people. It’s always a challenge to figure out what’s the truth. But if you’ve worked in the field for a while, you can identify trusted sources. We call this “news sense.”
A: Clark: Not so much in public health. When there have been imposters and liars, there’s been a rapid debunking using the same tools.

Q:What incentives can we give for opening up corporate data?
A: Robert K: We call this data philanthropy but the private sector doesn’t see it that way. They don’t want their markets to fall into poverty; it’s business risk mitigation insurance. So there are some incentives there already.
A: Robert B: We need to make it possible for people to create apps that use the data.

Q: How about that Twitter censorship policy?
A: Ahmed: It’s censorship, but the way Twitter approached this was transparent, and some people is good for activists because they could have gone for a broader censorship policy; Twitter will only block in the country that demands it. In fact, Twitter lets you get around it by changing your location.

Q: How do we get Netvibes past the security concerns?
A: Robert B.: I’m a security geek. But employees need tools to be smarter. But we can define what tools you have access to.

Q: Clark, do you run into privacy issues?
A: Clark: Most of the data in HealthMap comes from publicly available sources.
A: Robert K: There are situations arising for which we do not have a framework. A child protection expert had just returned frmo a crisis where young kids on a street were tweeting about being abused at home. “We’re not even allowed to ask that question,” she said, “but if they’re telling the entire world, can we use that to begin to advocate for their rescue?” Our frameworks have not yet adapted to this new reality.

Linton: After the Arab Spring, how do we use data to help build enduring value?
A: Ahmed: It’s not the tech but how we use it.
A: Robert K: Real time analytics and visualizations provide many-to-many communications. Groups can see their beliefs, enabling a type of self-awareness not possible before. These tools have the possibility of creating new types of identity.
A: Robert B: To get twitter or Facebook smarter, you have to find different ways to use it. “Break it!” Don’t get stuck using today’s tech.

Linton: A 26-ear-old Al Jazeera reporter was at a conf “What’s the next big thing?” She replied, “I’m too old. Ask a high school student.”



Here’s a video of my talk at the Harvard Bookstore last week.

And Ulrike Reinhart has posted a two-part podcast from the summer of a conversation I had with Peter Kruse, here and here. Thanks, Ulrike and Peter!