Archive for March, 2011

[2b2k] The encyclopedia of changes

I just shared a cab with James Bridle, a UK publisher and digital activist (my designation, not his) who is the brilliance behind the printing out of the changes to the Wikipedia article on the Iraq War. It turns out that those changes — just the changed portions — fill up twelve volumes.

What does the project show? “The argument,” James says. Of course it also shows the power of the cognitive surplus: we just casually created twelve volumes of changes in our spare time. If only all users of Wikipedia all understood how it’s put together! (Rather than banning students from using Wikipedia, it’d be far better if teachers required students to click on the “Discussion” tab.)


[2b2k] Melting points: a model for open data?

Jean-Claude Bradley at Useful Chemistry has announced (a few weeks ago) that the international chemical company Alfa Aesar has agreed to open source its melting point data. This is important not just because Alfa Aesar is one of the most important sources of that information. It also provides a model that could work outside of chemistry and science.

The data will be useful to the Open Notebook Science solubility project, and because Alfa has agreed to Open Data access, it can be useful far beyond that. In return, the Open Notebook folks cleaned up Alfa’s data, putting it into a clean database format, providing unique IDs (ChemSpiderIDs), and linking back to the Alfa Aesar catalog page.

Open Notebook then merged the cleaned-up data set with several others. The result was a set of 13,436 Open Data melting point values.

They then created a Web tool for exploring the merged dataset.

Why stop with melting points? Why stop with chemistry? Open data for, say, books could lead readers to libraries, publishers, bookstores, courses, other readers…


Can there be too much information? And what would it be too much of?

As PR for an upcoming appearance by James Gleick, whose new book The Information I am greatly looking forward to reading, Zocalo Public Square asked four or five folks “Can there be too much information?” It’s an interesting collection of responses. (Well, mine excepted.)

And underneath these interesting-in-themselves essays runs a different question when they are taken together: What the heck do we mean by “information” anyway? I’m not sure any of the respondents is defining it in the same way. The ways include: opinions, raw data, words, ideas, photos, switches and dials, and books. Of course, some of these are containers of information or examples of information. But they do not reduce to a single definition. (I believe Gleick’s book is at least in part about this ambiguity about information. It’s also something I’ve been researching for the past couple of years.)

As far as my contribution goes, I had to decide whether to provide an Everything Is Miscellaneous answer (we are learning to organize info in new ways) or a Too Big to Know answer (the quantity of info is changing the nature of knowledge). I went with the new book rather than the old, if only because I wrote the tiny essay within minutes after finishing revising the book manuscript.


Too Big to Know: The Bibliography

Last night I sent my editor, Tim Bartlett, the next rev of Too Big to Know. It took me a few weeks of solid work – somewhat obsessive, perhaps – to respond to my editors’ comments because they were challenging at the level of my arguments (such as they are). Then, after going through it once, I spent another week reading through the entire manuscript to get a better sense of the flow. That quick read-through actually got me to make some fairly substantial changes. It also reminded me once again how easy it is to miss obvious errors. In fact, even after sending it in to my editor, I doscovered an “it’s” that should have been an “its” … on the first page. Yikes.

Now Tim has to read through the rev, come back with more changes for me to work on or pass it on to copy editing. As far as I know, the book is still scheduled for a Fall release.

As part of this rev, I worked on the bibliography. I’m planning on not including it in the book itself, although I’m open to Tim’s advice. In any case, I will put it up at the TooBigToKnow website (which currently consists of nothing but posts tagged here). If you want to see the current version of the bibliography, it’s available as a Google Docs spreadsheet here. I’m thinking that making it available as a spreadsheet online makes it more useful. Also, I plan on annotating it.

Putting it together made me wonder if the ease with which we can do research online is causing the average length of bibliographies to increase…


Why so little blogging?

It’s been a slow week on this blog because I’m doing a final read-through of the second draft of “Too Big to Know.”

Last week I finished accommodating my editor’s comments. He gave it an extraordinarily helpful reading, focusing mainly on making sure my arguments make sense and are clearly expressed. So, it was a challenging rewrite.

Then, having focused on the arguments, generally at the paragraph and occasionally at the section level, I really wanted to re-read the entire thing to check for page-to-page flow — and to have more of the reader’s experience of the book. (Reading it the way readers may read it is the greatest challenge writers face. Or maybe I’ll just speak for myself.)

I’m about two-thirds of the way through and hope to finish this weekend, although I have a family event on Sunday that may keep me from that goal. Also, as I’ve been giving it a supposedly quick re-read, I have stopped in a few places to undo the revisions I’d made, which is a little discouraging.

Anyway, within a few days I hope to have unloaded it back into the arms of my patient editor. Then for me it’s a three-week binge of sensual over-indulgence. Or, possibly, working in my office in the basement of the Law Library.


Imperial College in showdown with closed-access journals

Felix Online, the online news of Imperial College in the UK, reports (in an article by Kadhim Shubber) that Deborah Shorley, Director of the Imperial College London Library, is threatening to end the library’s subscriptions to journals published by Elsevier and Wiley Blackwell, two of the major publishers in the UK. Rather than giving into the bundling of journals with 6% annual subscription prices (well above inflation, and in the face of a growth in profits at Elsevier from £1B to £1.6B from 2005 to 2009), she is demanding a 15% reduction in fees, as well as other concessions.

Says the article: “…if an agreement or an alternative delivery plan is not in place by January 2nd next year, researchers at Imperial and elsewhere will lose access to thousands of journals. But Deborah Shorley is determined to take it to the edge if necessary: ‘I will not blink.’”

As the article mentions, in 2010, after a 400% fee increase, the University of California threatened to boycott the Nature Publishing Group, including not engaging in peer review for NPG’s journals. (NPG claims that the rise in fees was due to the reduction of a discount from 88% to 50%. UC disputes this.) In August of 2010, NPG and UC made nice and announced “an agreement to work together to address the current licensing challenges as well as the larger issues of sustainability in the scholarly communication process.” [more and more]

Wow, we’re in a painful transition period. Open access will win.


[2b2k] Tagging big data

According to an article in Science Insider by Dennis Normile, a group formed at a symposium sponsored by the Board on Global Science and Technology, of the National Research Council, an arm of the U.S. National Academies [that's all they've got??] is proposing making it easier to find big scientific data sets by using a standard tag, along with a standard way of conveying the basic info about the nature of the set, and its terms of use. “The group hopes to come up with a protocol within a year that researchers creating large data sets will voluntarily adopt. The group may also seek the endorsement of the Internet Engineering Task Force…”