Archive for September, 2012

[2b2k] Knowledge and the future of story-telling

I’m leading one of the many sessions at the Future of Story-telling conference this week. They’ve got an interesting methodology: They produced a short video for each of the sessions. Attendees are required to watch all 15 in order to decide which sessions to go to. The sessions are open discussions on the topics in the videos, without any slide decks, etc. It’s a really interesting set of discussion leaders. I’m expecting it to be unique and provocative.

Here’s the video they produced for me:

(My one concern about the conference: They do not want us using computers, smart phones, etc., to keep us “present.” But the Net is my present!)

Tags:

[2b2k] What knowledge is losing

Jon Lebkowsky in a discussion of Too Big to Know at The Well asked, “What new roles are emerging that weren’t there before?”

Here’s part of my answer (with a few typos fixed):

- Taxonomies, nomenclatures, classification. Having common ways to refer to things is really helpful. We can make up for them to at least some degree by cross-walking and mapping. It’s always going to be messy. The rise of unique IDs and namespaces is helping a great deal.

- Filters. We used to not worry about filters because all we could get was the filtered product. Now we have to worry about them all the time. But we also now filter forward rather than filter out: When the site TheBrowser.com puts together a front page with 10 items on it from around the Web, all the other items that didn’t make it onto the front page are still fully available; TheBrowser.com has merely shortened the number of clicks it takes to get to its ten.

- Consensus. We used to think that we “all” agreed on some things. We had authorities we “all” trusted. Now we have communities of belief. Links and conversation can help us get past the fragmentation that makes us stupid, but not past all fragmentation.

But we should keep in mind that we’ve lost these old formations to a large degree because they don’t scale, and because they presented themselves to us under false pretenses: they were never as baked into the world as they seemed.

It’s our knowledge now.

Tags:

[2b2k] Decisions and character

I just read Michael Lewis’ tag-along look at President Obama. It shows aspects of Obama not readily on display. But mainly it’s about being the President as Decider.

The article makes it clear to me that the presidency is not a possible job. No one cannot be adequately prepared to deal with the range of issues the president faces, most of which have significant effects on very real people. The president therefore needs processes that enable him (so far it’s been hims, kids) to make good decisions, the personality that will let him embrace those processes, and the character to continue making decisions while fully appreciating the consequences of his actions.

Mothers, don’t let you kids grow up to be presidents. Holy cow.

Tags:

[2b2k] Truth as meta

I’m engaged in a multi-day conversation at The Well, led by Jon Lebkowskyjoin in! — about Too Big to Know, and found myself summing up the book as follows:

Traditional knowledge seemed like true content handed to us by competent experts. Networked knowledge seems like the work of humans who never quite get anything right.

Now, I’m of course not completely satisfied with that answer; if I were, I would have written a tweet instead of a book. But it leads to one of my many fears about this new knowledge ecosystem, which nevertheless holds such tremendous promise.

I think the Net only makes us smarter if we come to understand that truth is a complex of metadata — if I may put it in the least helpful way possible. In fact, you could substitute “authority” or “truth” in that sentence and have a less contentious way of putting it, and we can postpone the debate about whether there is really much of a difference between the two terms. Anyway, the simple point I’m failing to make is that the paper world tends toward establishing truths. Once established, they can be accepted without regard for the process by which they were established. Of course scholars and experts in the field will always be willing to challenge those processes, but our knowledge strategy has been to build upon a bedrock of established truths without having to re-establish each of them.

It is no accident that this mirrors the strengths and limitations of publishing truths on paper. Once published, paper-based works are literally independent of their sources. This independence enables truths to be distributed around the world, but at a cost. One of Plato’s problems with paper as opposed to dialogue was in fact that you can’t ask the paper any questions. Not only are we cut off from the processes that led to that truth, the paper seemingly inevitably takes on its own authority: If it made it through the editorial filters that the finitude of paper and bookshelves necessitate, then it must have some value.

It’s different on the Net. All it takes is a link to enable readers to see the processes — the drafts, the revisions, the arguments — that led to the page they’re reading. Authorial pride may get in the way of showing these processes, but increasingly the signal is flipping, so that not showing your work is taken as a sign of pretension, arrogance, or even fear, while showing the drafts and disagreements signals confidence and a commitment to truth…

…because truth on the Net needs to be more than the totality of statements that are true. For us to advance as a culture, we need to understand the human involvement in truth. We need to have as a guiding assumption that truth is something we argue about, that it is always seen from a particular historical and cultural position, that is never simply the statement that asserts something true.

And the Net is great at that. Links can lead us back to the processes that led to the assertions on the page, and links can lead us out into a world that interprets and challenges the assertions. Our overall experience of the Web as chaotic informs us that there are lots of different ideas, and, no, they don’t all fit together harmoniously.

If we stick with our old habits on the Net, then not only do we fail to advance, we regress. There are more untruths to learn on the Net than there ever were in the paper world. If we don’t grow into the assumption that truth always has a meta context, we will believe more flat-footed lies.

Now, I’m optimistic about this. I think some of these lessons are learned simply by being on the Web: Ideas are hyperlinked. The world is in disagreement. But these lessons are not inevitable, or at least they can be suppressed by our old instincts and by our intellectual laziness (or call it efficiency if you prefer): Just as when we see a bright shiny object, our eyes twitch toward it, when we see a bright rectangle of text and graphics, our brains twitch toward giving it credence. That was a much more useful (lazy/efficient) reflex in the paper days when publication entailed filtering. It is a habit that leads us away from truth in the Net age.

And the evidence is not entirely encouraging. One study — which I cannot find, thus causing my entire argument here to do the Happy Irony Dance— found that only a tiny percentage of students who consult Wikipedia ever look at the “talk” or “discussion” pages where Wikipedia’s assertions are argued. That’s in part a failure of education and a failure by Wikipedia to explain itself. It is in part a reflection of the fact that people generally come to an encyclopedia to get answers, not to read back-and-forth arguments. But apparently (see the Irony Dance above) only a small percentage of Wikipedia users even know what the Talk pages are.

One of the definitions of “fundamentalism” of any kind is that it is the assumption that texts speak for themselves, without interpretation or inquiry. Fundamentalism becomes much more dangerous when the seeker of belief has a near infinity of scriptures from which to choose. I believe the Net is making us far smarter, but on cloudy days I wonder.

Tags:

Obesity is good for your heart

From TheHeart.org, an article by Lisa Nainggolan:

Gothenburg, Sweden – Further support for the concept of the obesity paradox has come from a large study of patients with acute coronary syndrome (ACS) in the Swedish Coronary Angiography and Angioplasty Registry (SCAAR) [1]. Those who were deemed overweight or obese by body-mass index (BMI) had a lower risk of death after PCI [percutaneous coronary intervention, aka angioplasty] than normal-weight or underweight participants up to three years after hospitalization, report Dr Oskar Angerås (University of Gothenburg, Sweden) and colleagues in their paper, published online September 5, 2012 in the European Heart Journal.

Can confirm. My grandmother in the 1930s was instructed to make sure she fed her husband lots and lots of butter to lubricate his heart after a heart attack. This proved to work extraordinarily well, at least until his next heart attack.

I refer once again to the classic 1999 The Onion headline: Eggs Good for You This Week.

Tags:

[2b2k] Library as platform

Library Journal just posted my article “Library as Platform.” It’s likely to show up in their print version in October.

It argues that there are reasons why libraries ought to think of themselves not as portals but as open platforms that give access to all the information and metadata they can, through human readable and computer readable forms.

Tags:

[2b2k] Crowdsourcing transcription

[This article is also posted at Digital Scholarship@Harvard.]

Marc Parry has an excellent article at the Chronicle of Higher Ed about using crowdsourcing to make archives more digitally useful:

Many people have taken part in crowdsourced science research, volunteering to classify galaxies, fold proteins, or transcribe old weather information from wartime ship logs for use in climate modeling. These days humanists are increasingly throwing open the digital gates, too. Civil War-era diaries, historical menus, the papers of the English philosopher Jeremy Bentham—all have been made available to volunteer transcribers in recent years. In January the National Archives released its own cache of documents to the crowd via its Citizen Archivist Dashboard, a collection that includes letters to a Civil War spy, suffrage petitions, and fugitive-slave case files.

Marc cites an article [full text] in Literary & Linguistic Computing that found that team members could have completed the transcription of works by Jeremy Bentham faster if they had devoted themselves to that task instead of managing the crowd of volunteer transcribers. Here are some more details about the project and its negative finding, based on the article in L&LC.

The project was supported by a grant of £262,673 from the Arts and Humanities Research Council, for 12 months, which included the cost of digitizing the material and creating the transcription tools. The end result was text marked up with TEI-compliant XML that can be easily interpreted and rendered by other apps.

During a six-month period, 1,207 volunteers registered, who together transcribed 1,009 manuscripts. 21% of those registered users actually did some transcribing. 2.7% of the transcribers produced 70% of all the transcribed manuscripts. (These numbers refer to the period before the New York Times publicized the project.)

Of the manuscripts transcribed, 56% were “deemed to be complete.” But the team was quite happy with the progress the volunteers made:

Over the testing period as a whole, volunteers transcribed an average of thirty-five manuscripts each week; if this rate were to be maintained, then 1,820 transcripts would be produced every twelve months. Taking Bentham’s difficult handwriting, the complexity and length of the manuscripts, and the text-encoding into consideration, the volume of work carried out by Transcribe Bentham volunteers is quite remarkable


Still, as Marc points out, two Research Associates spent considerable time moderating the volunteers and providing the quality control required before certifying a document as done. The L&LC article estimates that RA’s could have transcribed 400 transcripts per month, 2.5x faster than the pace of the volunteers. But, the volunteers got better as they were more experienced, and improvements to the transcription software might make quality control less of an issue.

The L&LC article suggests two additional reasons why the project might be considered a success. First, it generated lots of publicity about the Bentham collection. Second, “no funding body would ever provide a grant for mere transcription alone.” But both of these reasons depend upon crowdsourcing being a novelty. At some point, it will not be.

Based on the Bentham project’s experience, it seems to me there are a few plausible possibilities for crowdsourcing transcription to become practical: First, as the article notes, if the project had continued, the volunteers might have gotten substantially more productive and more accurate. Second, better software might drive down the need for extensive moderation, as the article suggests. Third, there may be a better way to structure the crowd’s participation. For example, it might be practical to use Amazon Mechanical Turk to pay the crowd to do two or three independent passes over the content, which can then be compared for accuracy. Fourth, algorithmic transcription might get good enough that there’s less for humans to do. Fifth, someone might invent something incredibly clever that increases the accuracy of the crowdsourced transcriptions. In fact, someone already has: reCAPTCHA transcribes tens of millions of words every day. So you never know what our clever species will come up with.

For now, though, the results of the Bentham project cannot be encouraging for those looking for a pragmatic way to generate high-quality transcriptions rapidly.

Tags:

[2b2k] Crowdsourcing transcription

[This article is also posted at Digital Scholarship@Harvard.]

Marc Parry has an excellent article at the Chronicle of Higher Ed about using crowdsourcing to make archives more digitally useful:

Many people have taken part in crowdsourced science research, volunteering to classify galaxies, fold proteins, or transcribe old weather information from wartime ship logs for use in climate modeling. These days humanists are increasingly throwing open the digital gates, too. Civil War-era diaries, historical menus, the papers of the English philosopher Jeremy Bentham—all have been made available to volunteer transcribers in recent years. In January the National Archives released its own cache of documents to the crowd via its Citizen Archivist Dashboard, a collection that includes letters to a Civil War spy, suffrage petitions, and fugitive-slave case files.

Marc cites an article [full text] in Literary & Linguistic Computing that found that team members could have completed the transcription of works by Jeremy Bentham faster if they had devoted themselves to that task instead of managing the crowd of volunteer transcribers. Here are some more details about the project and its negative finding, based on the article in L&LC.

The project was supported by a grant of £262,673 from the Arts and Humanities Research Council, for 12 months, which included the cost of digitizing the material and creating the transcription tools. The end result was text marked up with TEI-compliant XML that can be easily interpreted and rendered by other apps.

During a six-month period, 1,207 volunteers registered, who together transcribed 1,009 manuscripts. 21% of those registered users actually did some transcribing. 2.7% of the transcribers produced 70% of all the transcribed manuscripts. (These numbers refer to the period before the New York Times publicized the project.)

Of the manuscripts transcribed, 56% were “deemed to be complete.” But the team was quite happy with the progress the volunteers made:

Over the testing period as a whole, volunteers transcribed an average of thirty-five manuscripts each week; if this rate were to be maintained, then 1,820 transcripts would be produced every twelve months. Taking Bentham’s difficult handwriting, the complexity and length of the manuscripts, and the text-encoding into consideration, the volume of work carried out by Transcribe Bentham volunteers is quite remarkable


Still, as Marc points out, two Research Associates spent considerable time moderating the volunteers and providing the quality control required before certifying a document as done. The L&LC article estimates that RA’s could have transcribed 400 transcripts per month, 2.5x faster than the pace of the volunteers. But, the volunteers got better as they were more experienced, and improvements to the transcription software might make quality control less of an issue.

The L&LC article suggests two additional reasons why the project might be considered a success. First, it generated lots of publicity about the Bentham collection. Second, “no funding body would ever provide a grant for mere transcription alone.” But both of these reasons depend upon crowdsourcing being a novelty. At some point, it will not be.

Based on the Bentham project’s experience, it seems to me there are a few plausible possibilities for crowdsourcing transcription to become practical: First, as the article notes, if the project had continued, the volunteers might have gotten substantially more productive and more accurate. Second, better software might drive down the need for extensive moderation, as the article suggests. Third, there may be a better way to structure the crowd’s participation. For example, it might be practical to use Amazon Mechanical Turk to pay the crowd to do two or three independent passes over the content, which can then be compared for accuracy. Fourth, algorithmic transcription might get good enough that there’s less for humans to do. Fifth, someone might invent something incredibly clever that increases the accuracy of the crowdsourced transcriptions. In fact, someone already has: reCAPTCHA transcribes tens of millions of words every day. So you never know what our clever species will come up with.

For now, though, the results of the Bentham project cannot be encouraging for those looking for a pragmatic way to generate high-quality transcriptions rapidly.

Tags: