Archive for October, 2011

[2b2k] Will digital scholarship ever keep up?

Scott F. Johnson has posted a dystopic provocation about the present of digital scholarship and possibly about its future.

Here’s the crux of his argument:

… as the deluge of information increases at a very fast pace — including both the digitization of scholarly materials unavailable in digital form previously and the new production of journals and books in digital form — and as the tools that scholars use to sift, sort, and search this material are increasingly unable to keep up — either by being limited in terms of the sheer amount of data they can deal with, or in terms of becoming so complex in terms of usability that the average scholar can’t use it — then the less likely it will be that a scholar can adequately cover the research material and write a convincing scholarly narrative today.

Thus, I would argue that in the future, when the computational tools (whatever they may be) eventually develop to a point of dealing profitably with the new deluge of digital scholarship, the backward-looking view of scholarship in our current transitional period may be generally disparaging. It may be so disparaging, in fact, that the scholarship of our generation will be seen as not trustworthy, or inherently compromised in some way by comparison with what came before (pre-digital) and what will come after (sophisticatedly digital).

Scott tentatively concludes:

For the moment one solution is to read less, but better. This may seem a luddite approach to the problem, but what other choice is there?

First, I should point out that the rest of Scott’s post makes it clear that he’s no Luddite. He understands the advantages of digital scholarship. But I look at this a little differently.

I agree with most of Scott’s description of the current state of digital scholarship and with the inevitability of an ever increasing deluge of scholarly digital material. But, I think the issue is not that the filters won’t be able to keep up with the deluge. Rather, I think we’re just going to have to give up on the idea of “keeping up” — much as newspapers and half hour news broadcasts have to give up the pretense that they are covering all the day’s events. The idea of coverage was always an internalization of the limitation of the old media, as if a newspaper, a broadcast, or even the lifetime of a scholar could embrace everything important there is to know about a field. Now the Net has made clear to us what we knew all along: most of what knowledge wanted to do was a mere dream.

So, for me the question is what scholarship and expertise look like when they cannot attain a sense of mastery by artificial limiting the material with which they have to deal. It was much easier when you only had to read at the pace of the publishers. Now you’d have to read at the pace of the writers…and there are so many more writers! So, lacking a canon, how can there be experts? How can you be a scholar?

I’m bad at predicting the future, and I don’t know if Scott is right that we will eventually develop such powerful search and filtering tools that the current generation of scholars will look betwixt-and-between fools (or as an “asterisk,” as Scott says). There’s an argument that even if the pace of growth slows, the pace of complexification will increase. In any case, I’d guess that deep scholars will continue to exist because that’s more a personality trait than a function of the available materials. For example, I’m currently reading Armies of Heaven, by Jay Rubenstein. The depth of his knowledge about the First Crusade is astounding. Astounding. As more of the works he consulted come on line, other scholars of similar temperament will find it easier to pursue their deep scholarship. They will read less and better not as a tactic but because that’s how the world beckons to them. But the Net will also support scholars who want to read faster and do more connecting. Finally (and to me most interestingly) the Net is already helping us to address the scaling problem by facilitating the move of knowledge from books to networks. Books don’t scale. Networks do. Although, yes, that fundamentally changes the nature of knowledge and scholarship.

[Note: My initial post embedded one draft inside another and was a total mess. Ack. I've cleaned it up - Oct. 26, 2011, 4:03pm edt.]

Tags:

[berkman] [2b2k] Michael Nielsen on the networking of science

Michael Nielsen is giving a Berkman talk on the networking of science. (It’s his first talk after his book Reinventing Discovery was published.)

NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.

He begins by telling the story of Tim Gowers, a Fields Medal winner and blogger. (Four of the 42 living Fields winners have started blogs; two of them are still blogging.) In January 2009, Gowers started posting difficult problems on his blog, and work on the problem in the open. Plus he invited the public to post ideas in the comments. He called this the Polymath Project. 170,000 words in the comments later, ideas had been proposed and rapidly improved or discarded. A few weeks later, the problem had been solved at an even higher level of generalization.

Michael asks: Why isn’t this more common? He gives an example of the failure of an interesting idea. It was proposed by a grad student in 2005. Qwiki was supposed to be a super-textbook about Quantum Mechanics. The site was well built and well marketed. “But science is littered with examples of wikis like this…They are not attracting regular contributors.” Likewise many scientific social networks are ghost towns. “The fundamental problem is one of opportunity costs. If you’re a young scientist, the way you build your career is through the publication of scientific papers…One mediocre crappy paper is going to do more your career than a series of brilliant contributions to a wiki.”

Why then is the Polymath Project succeeding? It just used an unconventional means to a conventional means: they published two papers out of it. Sites like Qwiki that are an end in themselves are not being exploited. We need a “change in norms in scientific culture” so that when people are making decisions about grants and jobs, people who contribute to unconventional formats are rewarded.

How do you achieve a change in the culture. It’s hard. Take the Human Genome project. In the 1990s, there wasn’t not a lot of advantage to individual scientists to share their data. In 1996, the Wellcome Trust held a meeting in Bermuda and agreed on principles that said that if you took more than a thousand base pairs, you need to release it to a public database and be put into the public domain. The funding agencies baked those principles into policy. In April 2000, Clinton and Blair urged all countries to adopt similar principles.

For this to work, you need enthusiastic acceptance, not just a stick beating scientists into submission. You need scientists to internalize it. Why? Because you need all sorts of correlative data to make lab data useful. E.g., Sloane Digital Sky Survey: a huge part of the project was establishing the calibration lines for the data to have meaning to anyone else.

Many scientists are pessimistic about this change occuring. But there’s some hopeful precedents. In 1610 Galileo pointed his telescope at Saturn. He was expecting to see a small disk. But he saw a disk with small knobs on either side — the rings, although he couldn’t resolve the image further. He sent letters to four colleagues, including Kepler that scrambled his discovery into an anagram. This way, if someone else made the discovery, Galileo could unscramble the letters and prove that he had made the discovery first. Leonardo, Newton, Hooks, Hyugens all did this. Scientific journals helped end this practice. The editors of the first journals had trouble convincing scientists to reveal their info because there was no link between publication and career. The editor of the first scientific journal (Philosophical Transactions of the Royal Society) goaded scientists into publishing by writing to them suggesting other scientists were about to disclose what the recipients of the letter were working on. As Paul David [Davis? Couldn't find it via Google] says, the change to the modern system was due to “patron pressure.”

Michael points out that Galileo immediately announced the discovery of four moons of Jupiter in order to get patronage bucks from the Medicis for the right to name them. [Or, as we would do today, The Comcast Moon, the Staples Moon, and the Gosh Honey Your Hair Smells Great Moon.]

Some new ideas: The Journal of Visualized Experiments videotapes lab work, thus revealing tacit knowledge. Geiger Science (from Springer) publishes data sets as first-class objects. Open Research Computation makes code into a first-class object. And blog posts are beginning to show up on Google Scholar (possible because they’re paying attention to tags?). So, if your post is being cited by lots of articles, your post will show up at Scholar.

[in response to a question] A researcher claimed to have solved the P not-P problem. One of the serious mathematicians (Cook) said it was a serious solution. Mathematicians and others tore it apart on the Web to see if it was right. About a week later, the consensus was that there was a serious obstruction, although they salvaged a small lemma. The process leveraged expertise in many different areas — statistical physics, logic, etc.

Q: [me] Science has been a type of publishing. How does scientific knowledge change when it becomes a type of networking?
A: You can see this beginning to happen in various fields. E.g., People at Google talk about their sw as an ecology. [Afterwards, Michael explained that Google developers use a complex ecology of libraries and services with huge numbers of dependencies.] What will it mean when someone says that the Higgs Boson has been found at the LHC? There are millions of lines of code, huge data sets. It will be an example of using networked knowledge to draw a conclusion where no single person has more than a tiny understanding of the chain of inferences that led to this result. How do you do peer review of that paper? Peer review can’t mean that it’s been checked because no one person can check it. No one has all the capability. How do you validate this knowledge? The methods used to validate are completely ad hoc. E.g., International Panel on Climate Change has more data than any one person can evaluate. And they don’t have a method. It’s ad hoc. They do a good job, but it’s ad hoc.

Q: Classification of Finite Groups were the same. A series of papers.
A: Followed by a 1200 word appendix addressing errors.

Q: It varies by science, of course. For practical work, people need access to the data. For theoretical work, the person who makes the single step that solves it should get 98% of the credit. E.g., Newton v. Leibniz on calculus. E.g., Perleman‘s approach to the Poincaré conjecture.
A: Yes. Perelman published three papers on a pre-press server. Afterward, someone published a paper that filled in the gaps, but Perelman’s was the crucial contribution. This is the normal bickering in science. I would like to see many approaches and gradual consensus. You’ll never have perfect agreement. With transparency, you can go back and see how people came to those ideas.

Q: What is validation? There is a fundamental need for change in the statistical algorithms that many data sets are built on. You have to look at those limitations as well as at the data sets.
A: There’s lots of interesting things happening. But I think this is a transient problem. Best practices are still emerging. There are a lot of statisticians on the case. A move toward more reproducible research and more open sharing of code would help. E.g., many random generators are broken, as is well known. Having the random generator code in an open repository makes life much easier.

Q: The P v not-P left a sense that it was a sprint in response to a crisis, but how can it be done in a more scalable way?
A: People go for the most interesting claims.

Q: You mentioned the Bermuda Principles, and NIH requires open access pub one year after paper pub. But you don’t see that elsewhere. What are the sociological reasons?
Peter Suber: There’s a more urgent need for medical research. The campaign for open access at NSF is not as large, and the counter-lobby (publishers of scientific journals) is bigger. But Pres. Obama has said he’s willing to do it by executive order if there’s sufficient public support. No sign of action yet.

Q: [peter suber] I want to see researchers enthusiastic about making their research public. How do we construct a link between OA and career?
A: It’s really interesting what’s going on. A lot of discussion about supporting gold OA (publishing in OA journals, as opposed to putting it into an OA repository). Fundamentally, it comes down to a question of values. Can you create a culture in science that views publishing in gold OA journals as better than publishing in prestigious toll journals. The best way perhaps is to make it a public issue. Make it embarrassing for scientists to lock their work away. The Aaron Swartz case has sparked a public discussion of the role publishers, especially when they’re making 30% profits.
Q: Peter: Whenever you raise the idea of tweaking tenure criteria, you unleash a tsunami of academic conservativism, even if you make clear that this would still support the same rigorous standards. Can we change the reward system without waiting for it to evolve?
A: There was a proposal a few years ago that it be done purely algorithmic: produce a number based on the citation index. If it had been done, simple tweaks to the algorithm would have been an example: “You get a 10% premium for being in a gold OA journal, etc.”
Q: [peter] One idea was that your work wouldn’t be noticed by the tenure committee if it wasn’t in an OA repository.
A: Spiers [??] lets you measure the impact of your pre-press articles, which has had made it easier for people to assess the effect of OA publishing. You see people looking up the Spiers number of a scientist they just met. You see scientists bragging about the number of times their slides have been downloaded via Mendeley.

Q: How can we accelerate by an order of magnitude in the short term?
A: Any tool that becomes widely used to measure impact affects how science is done. E.g., the H Index. But I’d like to see a proliferation of measures because when you only have one, it reduces cognitive diversity.

Q: Before the Web, Erdos was the moving collaborator. He’d go from place to place and force collaboration. Let’s duplicate that on the Net!
A: He worked 18 hours a day, 365 days/year, high on amphetamines. Not sure that’s the model :) He did lots of small projects. When you have a large project, you bring in the expertise you need. Open collaboration has the unpredictable spread of expertise that participates, and that’s often crucial. E.g., Einstein never thought that understanding gravity required understanding non-standard geometries. He learned that from someone else [missed who]. That’s the sort of thing you get in open collaborations.

Q: You have to have a strong ego to put your out-there paper out there to let everyone pick it apart.
A: Yes. I once asked a friend of mine how he consistently writes edgy blog posts. He replied that it’s because there are some posts he genuinely regrets writing. That takes a particular personality type. But the same is true for publishing papers.
Q: But at least you can blame the editors or peer reviewers.
A: But that’s very modern. In the 1960s. Of Einstein’s 300 papers, only one was peer reviewed … and that one was rejected. Newton was terribly anguished by the criticism of his papers. Networked science may exacerbate it, but it’s always been risky to put your ideas out there.

[Loved this talk.]

Tags:

What “I know” means

If meaning is use, as per Wittgenstein and John Austin, then what does “know” mean?

I’m going to guess that the most common usage of the term is in the phrase “I know,” as in:

1. “You have to be careful what you take Lipitor with.” “I know.”
2. “The science articles have gotten really hard to read in Wikipedia.” “I know.”
3. “This cookbook thinks you’ll just happen to have strudel dough just hanging around.” “I know.”
4. “The books are arranged by the author’s last name within any one topic area.” “I know.”
5. “They’re closing the Red Line on weekends.” “I know!”

In each of these, the speaker is not claiming to have an inner state of belief that is justifiable and true. The speaker is using “I know” to shape the conversation and the social relationship with the initial speaker.

1., 4. “You can stop explaining now.”
2., 3. “I agree with you. We’re on the same side.”
5. “I agree that it’s outrageous!”

And I won’t even mention words like “surely” and “certainly” that are almost always used to indicate that you’re going to present no evidence for the claim that follows.

Tags:

[2b2k] Why this article?

An possible explanation of the observation of neutrinos traveling faster than light has been posted at Arxiv.org by Ronald van Elburg. I of course don’t have any of the conceptual apparatus to be able to judge that explanation, but I’m curious about why, among all the explanations, this is one I’ve now heard about it.

In a properly working knowledge ecology, the most plausible explanations would garner the most attention, because to come to light an article would have to pass through competent filters. In the new ecology, it may well be that what gets the most attention are articles that appeal to our lizard brains in various ways: they make overly-bold claims, they over-simplify, they confirm prior beliefs, they are more comprehensible to lay people than are ideas that require more training to understand, they have an interesting backstory (“Ashton Kutcher tweets a new neutrino explanation!”)…

By now we are all familiar with the critique of the old idea of a “properly working knowledge ecology”: Its filters were too narrow and were prone to preferring that which was intellectually and culturally familiar. There is a strong case to be made that a more robust ecology is wilder in its differences and disagreements. Nevertheless, it seems to me to be clearly true (i.e., I’m not going to present any evidence to support the following) that to our lizard brains the Internet is a flat rock warmed by a bright sun.

But that is hardly the end of the story. The Internet isn’t one ecology. It’s a messy cascade of intersecting environents. Indeed, the ecology metaphor doesn’t suffice, because each of us pins together our own Net environments by choosing which links to click on, which to bookmark, and which to pass along to our friends. So, I came across the possible neutrino explanation at Metafilter, which I was reading embedded within Netvibes, a feed aggregator that I use as my morning newspaper. A comment at Metafilter pointed to the top comment at Reddit’s AskScience forum on the article, which I turned to because on this sort of question I often find Reddit comment threads helpful. (I also had a meta-interest in how articles circulate.) If you despise Reddit, you would have skipped the Metafilter comment’s referral to that site, but you might well hae pursued a different trail of links.

If we take the circulation of Ronald van Elburg’s article as an example, what do we learn? Well, not much because it’s only one example. Nevertheless, I think it at least helps make clear just how complex our “media environment” has become, and some of the effects it has on knowledge and authority.

First, we don’t yet know how ideas achieve status as centers of mainstream contention. Is von Elburg’s article attaining the sort of reliable, referenceable position that provides a common ground for science? It was published at Arxiv, which lets any scientist with an academic affiliation post articles at any stage of readiness. On the other hand, among the thousands of articles posted every day, the Physics Arxiv blog at Technology Review blogged about this one. (Even who’s blogging about what where is complex!) If over time von Elburg’s article is cited in mainstream journals, then, yes, it will count as having vaulted the wall that separates the wannabes from the contenders. But, to what extent are articles not published in the prestigious journals capable of being established as touchpoints within a discipline? More important, to what extent does the ecology still center around controversies about which every competent expert is supposed to be informed? How many tentpoles are there in the Big Tent? Is there a Big Tent any more?

Second, as far as I know, we don’t yet have a reliable understanding of the mechanics of the spread of ideas, much less an understanding of how those mechanics relate to the worth of ideas. So, we know that high-traffic sites boost awareness of the ideas they publish, and we know that the mainstream media remain quite influential in either the creation or the amplification of ideas. We know that some community-driven sites (Reddit, 4chan) are extraordinarily effective at creating and driving memes. We also know that a word from Oprah used to move truckloads of books. But if you look past the ability of big sites to set bonfires, we don’t yet understand how the smoke insinuates its way through the forest. And there’s a good chance we will never understand it very fully because the Net’s ecology is chaotic.

Third, I would like to say that it’s all too complex and imbued with value beliefs to be able to decide if the new knowledge ecology is a good thing. I’d like to be perceived as fair and balanced. But the truth is that every time I try to balance the scales, I realize I’ve put my thumb on the side of traditional knowledge to give it heft it doesn’t deserve. Yes, the new chaotic ecology contains more untruths and lies than ever, and they can form a self-referential web that leaves no room for truth or light. At the same time, I’m sitting at breakfast deciding to explore some discussions of relativity by wiping the butter off my finger and clicking a mouse button. The discussions include some raging morons, but also some incredibly smart and insightful strangers, some with credentials and some who prefer not to say. That’s what happens when a population actually engages with its culture. To me, that engagement itself is more valuable than the aggregate sum of stupidity it allows.


(Yes, I know I’m having some metaphor problems. Take that as an indication of the unsettled nature of our thought. Or of bad writing.)

Tags:

[2b2k] Bookbinding and the Digital Bible

Avi Solomon at BoingBoing has a terrific interview with Michael Greer about the appeal of bookbinding, and about Michael’s “Digital Bible.”

I love the photo:

Digital Bible: Book with ones and zeroes as text

Tags:

[2b2k] Retraction system creaking under the load

According to a post at Nature by Richard Van Noorden, the rate of retracted scientific articles is growing far faster than the rate of published or posted articles. No one is sure why, but it is exposing inconsistencies in policies for dealing with retracted articles.

Suggested reforms include better systems for linking papers to their retraction notices or revisions, more responsibility on the part of journal editors and, most of all, greater transparency and clarity about mistakes in research.

It’s encouraging that it’s taken as obvious that the proper response is links and transparency. Gotta love science.

Tags:

Erik Martin on what makes Reddit special

Erik Martin, the general manager of Reddit, explains what’s so special about the discussion site. I’m particularly interested in the nature of authority on the site, and its introduction of new journalistic rhetorical forms.

Tags:

[2b2k] How we assess credibility

Soo Young Rieh is an associate professor at the University of Michigan School of Information. She recently finished a study (funded in part by MacArthur) on how people assess the credibility of sources when they are just searching for information and when they are actually posting information. Her study didn’t focus on a particular age or gender, and found [SPOILER] that we don’t take extra steps to assess the credibility of information when we are publishing it.

Tags: