Archive for category facts

[2b2k] The public ombudsman (or Facts don’t work the way we want)

I don’t care about expensive electric sports cars, but I’m fascinated by the dustup between Elon Musk and the New York Times.

On Sunday, the Times ran an article by John Broder on driving the Tesla S, an all-electric car made by Musk’s company, Tesla. The article was titled “Stalled Out on Tesla’s Electric Highway,” which captured the point quite concisely.

Musk on Wednesday in a post on the Tesla site contested Broder’s account, and revealed that every car Tesla lends to a reviewer has its telemetry recorders set to 11. Thus, Musk had the data that proved that Broder was driving in a way that could have no conceivable purpose except to make the Tesla S perform below spec: Broder drove faster than he claimed, drove circles in a parking lot for a while, and didn’t recharge the car to full capacity.

Boom! Broder was caught red-handed, and it was data that brung him down. The only two questions left were why did Broder set out to tank the Tesla, and would it take hours or days for him to be fired?

Except…

Rebecca Greenfield at Atlantic Wire took a close look at the data — at least at the charts and maps that express the data — and evaluated how well they support each of Musk’s claims. Overall, not so much. The car’s logs do seem to contradict Broder’s claim to have used cruise control. But the mystery of why Broder drove in circles in a parking lot seems to have a reasonable explanation: he was trying to find exactly where the charging station was in the service center.

But we’re not done. Commenters on the Atlantic piece have both taken it to task and provided some explanatory hypotheses. Greenfield has interpolated some of the more helpful ones, as well as updating her piece with testimony from the tow-truck driver, and more.

But we’re still not done. Margaret Sullivan [twitter:sulliview] , the NYT “public editor” — a new take on what in the 1960s we started calling “ombudspeople” (although actually in the ’60s we called them “ombudsmen”) — has jumped into the fray with a blog post that I admire. She’s acting like a responsible adult by witholding judgment, and she’s acting like a responsible webby adult by talking to us even before all the results are in, acknowledging what she doesn’t know. She’s also been using social media to discuss the topic, and even to try to get Musk to return her calls.

Now, this whole affair is both typical and remarkable:

It’s a confusing mix of assertions and hypotheses, many of which are dependent on what one would like the narrative to be. You’re up for some Big Newspaper Schadenfreude? Then John Broder was out to do dirt to Tesla for some reason your own narrative can supply. You want to believe that old dinosaurs like the NYT are behind the curve in grasping the power of ubiquitous data? Yup, you can do that narrative, too. You think Elon Musk is a thin-skinned capitalist who’s willing to destroy a man’s reputation in order to protect the Tesla brand? Yup. Or substitute “idealist” or “world-saving environmentally-aware genius,” and, yup, you can have that narrative too.

Not all of these narratives are equally supported by the data, of course — assuming you trust the data, which you may not if your narrative is strong enough. Data signals but never captures intention: Was Broder driving around the parking lot to run down the battery or to find a charging station? Nevertheless, the data do tell us how many miles Broder drove (apparently just about the amount that he said) and do nail down (except under the most bizarre conspiracy theories) the actual route. Responsible adults like you and me are going to accept the data and try to form the story that “makes the most sense” around them, a story that likely is going to avoid attributing evil motives to John Broder and evil conspiratorial actions by the NYT.

But the data are not going to settle the hash. In fact, we already have the relevant numbers (er, probably) and yet we’re still arguing. Musk produced the numbers thinking that they’d bring us to accept his account. Greenfield went through those numbers and gave us a different account. The commenters on Greenfield’s post are arguing yet more, sometimes casting new light on what the data mean. We’re not even close to done with this, because it turns out that facts mean less than we’d thought and do a far worse job of settling matters than we’d hoped.

That’s depressing. As always, I am not saying there are no facts, nor that they don’t matter. I’m just reporting empirically that facts don’t settle arguments the way we were told they would. Yet there is something profoundly wonderful and even hopeful about this case that is so typical and so remarkable.

Margaret Sulllivan’s job is difficult in the best of circumstances. But before the Web, it must have been so much more terrifying. She would have been the single point of inquiry as the Times tried to assess a situation in which it has deep, strong vested interests. She would have interviewed Broder and Musk. She would have tried to find someone at the NYT or externally to go over the data Musk supplied. She would have pronounced as fairly as she could. But it would have all been on her. That’s bad not just for the person who occupies that position, it’s a bad way to get at the truth. But it was the best we could do. In fact, most of the purpose of the public editor/ombudsperson position before the Web was simply to reassure us that the Times does not think it’s above reproach.

Now every day we can see just how inadequate any single investigator is for any issue that involves human intentions, especially when money and reputations are at stake. We know this for sure because we can see what an inquiry looks like when it’s done in public and at scale. Of course lots of people who don’t even know that they’re grinding axes say all sorts of mean and stupid things on the Web. But there are also conversations that bring to bear specialized expertise and unusual perspectives, that let us turn the matter over in our hands, hold it up to the light, shake it to hear the peculiar rattle it makes, roll it on the floor to gauge its wobble, sniff at it, and run it through sophisticated equipment perhaps used for other purposes. We do this in public — I applaud Sullivan’s call for Musk to open source the data — and in response to one another.

Our old idea was that the thoroughness of an investigation would lead us to a conclusion. Sadly, it often does not. We are likely to disagree about what went on in Broder’s review, and how well the Tesla S actually performed. But we are smarter in our differences than we ever could be when truth was a lonelier affair. The intelligence isn’t in a single conclusion that we all come to — if only — but in the linked network of views from everywhere.

There is a frustrating beauty in the way that knowledge scales.

Tags:

[2b2k] What do we learn from our failure to believe the polls?

There’s lots being written about why the Republicans were so wrong in their expectations about this week’s election. They had the same data as the rest of us, yet they apparently deeply believed they were going to win. I think it’s a fascinating question. But I want to put it to different use.

The left-wing subtext about the Republican leadership’s failure to interpret the data is that it’s comeuppance for their failure to believe in science or facts. But that almost surely is a misreading. The Republicans thought they had factual grounds for disbelieving the polls. The polls, they thought, were bad data that over-counted Democrats. The Republicans thus applied an unskewing algorithm in order to correct them. Thus, the Republicans weren’t pooh-poohing the importance of facts. They were being good scientists, cleaning up the data. Now, of course their assumptions about the skewing of the data were wrong, and there simply has to be an element of wish-fulfillment (and thus reality denial) in their belief that the polls were skewed. But, their arguments were based on what they thought was a fact about a problem with the data. They were being data-based. They just did a crappy job of it.

So what do we conclude? First, I think it’s important to recognize that it wasn’t just the Republicans who looked the data in the face and drew entirely wrong conclusions. Over and over the mainstream media told us that this race was close, that it was a toss-up. But it wasn’t. Yes, the popular vote was close, although not as close as we’d been led to believe. But the outcome of the race wasn’t a toss-up, wasn’t 50-50, wasn’t close. Obama won the race decisively and not very long after the last mainland polls closed…just as the data said he would. Not only was Nate Silver right, his record, his methodology, and the transparency of his methodology were good reasons for thinking he would be right. Yet, the mainstream media looked at the data and came to the wrong conclusion. It seems likely that they did so because they didn’t want to look like they were shilling for Obama and because they wanted to keep us attached to the TV for the sake of their ratings and ad revenues.

I think the media’s failure to draw the right and true conclusions from the data is a better example of a non-factual dodge around inconvenient truths than is the Republicans’ swerve.

Put the two failures together, and I think this is an example of the the inability of facts and data to drive us to agreement. Our temptation might be to look at both of these as fixable aberrations. I think a more sober assessment, however, should lead us to conclude that some significant portion of us is always going to find a way to be misled by facts and data. As a matter of empirical fact, data does not drive agreement, or at least doesn’t drive it sufficiently strongly that by itself it settles issues. For one reason or another, some responsible adults are going to get it wrong.

This doesn’t mean we should give up. It certainly doesn’t lead to a relativist conclusion. It instead leads to an acceptance of the fact that we are never going to agree, even when the data is good, plentiful, and right in front of our eyes. And, yeah, that’s more than a little scary.

Tags:

[2b2k] The commoditizing and networking of facts

Ars Technica has a post about Wikidata, a proposed new project from the folks that brought you Wikipedia. From the project’s introductory page:

Many Wikipedia articles contain facts and connections to other articles that are not easily understood by a computer, like the population of a country or the place of birth of an actor. In Wikidata you will be able to enter that information in a way that makes it processable by the computer. This means that the machine can provide it in different languages, use it to create overviews of such data, like lists or charts, or answer questions that can hardly be answered automatically today.

Because I had some questions not addressed in the Wikidata pages that I saw, I went onto the Wikidata IRC chat (http://webchat.freenode.net/?channels=#wikimedia-wikidata) where Denny_WMDE answered some questions for me.

[11:29] hi. I’m very interested in wikidata and am trying to write a brief blog post, and have a n00b question.

[11:29] go ahead!

[11:30] When there’s disagreement about a fact, will there be a discussion page where the differences can be worked through in public?

[11:30] two-fold answer

[11:30]
1. there will be a discussion page, yes

[11:31]
2. every fact can always have references accompanying it. so it is not about “does berlin really have 3.5 mio people” but about “does source X say that berlin has 3.5 mio people”

[11:31]
wikidata is not about truth

[11:31]
but about referenceable facts

When I asked which fact would make it into an article’s info box when the facts are contested, Denny_WMDE replied that they’re working on this, and will post a proposal for discussion.

So, on the one hand, Wikidata is further commoditizing facts: making them easier and thus less expensive to find and “consume.” Historically, this is a good thing. Literacy did this. Tables of logarithms did it. Almanacs did it. Wikipedia has commoditized a level of knowledge one up from facts. Now Wikidata is doing it for facts in a way that not only will make them easy to look up, but will enable them to serve as data in computational quests, such as finding every city with a population of at least 100,000 that has an average temperature below 60F.

On the other hand, because Wikidata is doing this commoditizing in a networked space, its facts are themselves links — “referenceable facts” are both facts that can be referenced, and simultaneously facts that come with links to their own references. This is what Too Big to Know calls “networked facts.” Those references serve at least three purposes: 1. They let us judge the reliability of the fact. 2. They give us a pointer out into the endless web of facts and references. 3. They remind us that facts are not where the human responsibility for truth ends.

Tags:

[2b2k] Moi

EconTalk has posted an hour interview with me by Russ Roberts about some of the topics in Too Big to Know that don’t come up so often.

Tags:

[2b2k] Moi moi moi

Steve Cottle has done a great job live-blogging my wrap-up talk at the Tech@State event. Thanks, Steve!

I was the guest on Tummelvision a couple of nights ago, which is podcast tumble-tumult of persons and ideas. It doesn’t get much more fun than that. Thanks, Heather, Kevin, and Deb!

The Berkman Center has posted the video of my book talk. Look on the bottom left to find the player and the links.

KMWorld’s Hugh McKellar has posted his interview with me.

And NYTECH has just posted a video of my talk there on Jan 25. The talk is about 45 mins and then there’s a lively Q&A. Thanks NY TECH!

Brandeins has posted an interview with Doc Searls and me about Cluetrain. (They translated it into German.)

Tags:

[2b2k] Census Bureau ends Statistical Abstract

The Census Bureau is no longer going to fund the creation of the Statistical Abstract of the United States, apparently in order to save $3M a year. As David Cay Johnston puts it:

Last year the online site was accessed 5.6 million times. If the absence of a Statistical Abstract increases search time by even two minutes, then the cost, based on the all-in average pay of reference librarians, will be about five times the federal savings. Were Congress to order up a cost-benefit study, the figure would be a loser, costing society at least $5 for every dollar of tax money saved.

Not to mention the symbolic slap in the face to supporting fact-based public discourse.

(The Census Bureau attempts to ameliorate this by pointing out that all the info is still available, dispersed across agencies and sources. Yeah, but if the Statistical Abstract ever had value — which it did — it’s because it aggregated data that can be difficult to chase down.)

Tags: