Archive for category open access

[avignon] [2b2k] Robert Darnton on the history of copyright , open access, the dpla…

NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.

We begin with a report on a Ministerial meeting yesterday here on culture — a dialogue among the stakeholders on the Internet. [No users included, I believe.] All agreed on the principles proposed at Deauville: It is a multi-stakeholder ecosystem that complies with law. In this morning’s discussion, I was struck by the convergence: we all agree about remunerating copyright holders. [Selection effect. I favor copyright and remunerating rights holders, but not as the supreme or exclusive value.] We agree that there are more legal alternatives. We agree that the law needs to be enforced. No one argued with that. [At what cost?] And we all agree we need international cooperation, especially to fight piracy.

Now Robert Darnton, Harvard Librarian, gives an invited talk about the history of copyright.

Darnton: I am grateful to be here. And especially grateful you did not ask me to talk about the death of the book. The book is not dead. More books are being produced in print and online every year than in the previous year. This year, more than 1 million new books will be produced. China has doubled its production of books in the past ten years. Brazil has a booming book industry. Even old countries like the US find book production is increasing. We should not bemoan the death of the book.

Should we conclude that all is well in the world of books? Certainly not. Listen to the lamentations of authors, publishers, booksellers. They are clearly frightened and confused. The ground is shifting beneath their feet and they don’t know where to stake a claim. The pace of tech is terrifying. What took millennia, then centuries, then decades, now happens all the time. Homesteading in the new info ecology is made difficult by uncertainty about copyright and economics.

Throughout early modern Europe, publishing was dominated by guilds of booksellers and printers. Modern copyright did not exist, but booksellers accumulated privileges, which Condorcet objected to. These privileges (AKA patents) gave them the exclusive rights to reproduce texts, with the support of the state. The monarchy in the 17th century eliminated competitors, especially ones in the provinces, reinforcing the guild, thus gaining control of publishing. But illegal production throve. Avignon was a great center of privacy in the 18th century because it was not French. It was surrounded by police intercepting the illegal books. It took a revolution to break the hegemony of the Parisian guild. For two years after the Bastille, the French press enjoyed liberty. Condorcet and others had argued for the abolition of constraints on the free exchange of ideas. It was a utopian vision that didn’t last long.

Modern copyright began with the 1793 French copyright law that established a new model in Europe. The exclusive right to sell a text was limited to the author for lifetime + 10 years. Meanwhile, the British Statute of Anne in 1710 created copyright. Background: The stationers’ monopoly required booksellers — and all had to be members — to register. The oligarchs of the guild crushed their competitors through monopolies. They were so powerful that they provoked results even within the book trade. Parliament rejected the guild’s attempt to secure the licensing act in 1695. The British celebrate this as the beginning of the end of pre-publication censorship.

The booksellers lobbied for the modern concept of copyright. For new works: 14 years, renewable once. At its origin, copyright law tried to strike a balance between the public good and the private benefit of the copyright owner. According to a liberal view, Parliament got the balance right. But the publishers refused to comply, invoking a general principle inherent in common law: When an author creates work, he acquires an unlimited right to profit from his labor. If he sold it, the publisher owned it in perpetuity. This was Diderot’s position. The same argument occurred in France and England.

In England, the argument culminated in a 1774 Donaldson vs. Beckett that reaffirmed 14 years renewable once. Then we Americans followed in our Constitution and in the first copyright law in 1790 (“An act for the encouragement of learning”, echoing the British 1710 Act): 14 years renewable once.

The debate is still alive. The 1998 copyright extension act in the US was considerably shaped by Jack Valenti and the Hollywood lobby. It extended copyright to life + 70 (or for corporations: life + 95). We are thus putting most literature out of the public domain and into copyright that seems perpetual. Valenti was asked if he favored perpetual copyright and said “No. Copyright should last forever minus one day.”

This history is meant to emphasize the interplay of two elements that go right through the copyright debate: A principle directed toward the public gain vs. self-interest for private gain. It would be wrong-headed and naive to only assert the former. B ut to assert only the latter would be cynical. So, do we have the balance right today?

Consider knowledge and power. We all agree that patents help, but no one would want the knowledge of DNA to be exploited as private property. The privitization of knowledge has become an enclosure movement. Consider academic periodicals. Most knowledge first appears in digitized periodicals. The journal article is the principle outlet for the sciences, law, philosophy, etc. Journal publishers therefore control access to most of the knowledge being created, and they charge a fortune. The price of academic journals rose ten times faster than the rate of inflation in the 1990s. The J of Comparative Neurology is $29,113/year. The Brain costs $23,000. The average list price in chemistry is over $3,000. Most of the research was subsidized by tax payers. It belongs in the public domain. But commercial publishers have fenced off parts of that domain and exploited it. Their profit margins runs as high as 40%. Why aren’t they constrained by the laws of supply and domain? Because they have crowded competitors out, and the demand is not elastic: Research libraries cannot cancel their subscriptions without an uproar from the faculty. Of course, professors and students produced the research and provided it for free to the publishers. Academics are therefore complicit. They advance their prestige by publishing in journals, but they fail to understand the damage they’re doing to the Republic of Letters.

How to reverse this trend? Open access journals. Journals that are subsidized at the production end and are made free to consumers. They get more readers, too, which is not surprising since search engines index them and it’s easy for readers to get to them. Open Access is easy access, and the ease has economic consequences. Doctors, journalists, researchers, housewives, nearly everyone wants information fast and costless. Open Access is the answer. It is a little simple, but it’s the direction we have to take to address this problem at least in academic journals.

But the Forum is thinking about other things. I admire Google for its technical prowess, but also because it demonstrated that free access to info can be profitable. But it ran into problems when it began to digitize books and make them available. It got sued for alleged breach of copyright. It tried to settle by turning it into a gigantic business and sharing the profits with the authors and publishers who sued them. Libraries had provided the books. Now they’d have to buy them back at a price set by Google. Google was fencing off access to knowledge. A federal judge rejected it because, among other points, it threatened to create a monopoly. By controlling access to books, Google occupied a position similar to that of the guilds in London and Paris.

So why not create a library as great as anything imagined by Google, but that would make works available to users free of charge? Harvard held a workshop on Oct. 1 2010 to explore this. Like Condorcet, a utopian fantasy? But it turns out to be eminently reasonable. A steering committee, a secretariat, 6 workgroups were established. A year later we launched the Digital Public Library of America at a conference hosted by the major cultural institutions in DC, and in April in 2013 we’ll have a preliminary version of it.

Let me emphasize two points. 1. The DPLA will serve a wide an varied constituency throughout the US. It will be a force in education, and will provide a stimulus to the economy by putting knowledge to work. 2. It will spread to everyone on the globe. The DPLA’s technical infrastructure is being designed to be interoperable with Europeana, which is aggregating the digital collections of 27 companies. National digital libraries are sprouting up everywhere, even Mongolia. We need to bring them together. Books have never respected boundaries. Within a few decades, we’ll have worldwide access to all the books in the world, and images, recordings, films, etc.

Of course a lot remains to be done. But, the book is dead? Long live the book!

Q: It is patronizing to think that the USA and Europe will set the policy here. India and China will set this policy.

A: We need international collaboration. And we need an infrastructure that is interoperable.

Tags:

[2b2k] Will digital scholarship ever keep up?

Scott F. Johnson has posted a dystopic provocation about the present of digital scholarship and possibly about its future.

Here’s the crux of his argument:

… as the deluge of information increases at a very fast pace — including both the digitization of scholarly materials unavailable in digital form previously and the new production of journals and books in digital form — and as the tools that scholars use to sift, sort, and search this material are increasingly unable to keep up — either by being limited in terms of the sheer amount of data they can deal with, or in terms of becoming so complex in terms of usability that the average scholar can’t use it — then the less likely it will be that a scholar can adequately cover the research material and write a convincing scholarly narrative today.

Thus, I would argue that in the future, when the computational tools (whatever they may be) eventually develop to a point of dealing profitably with the new deluge of digital scholarship, the backward-looking view of scholarship in our current transitional period may be generally disparaging. It may be so disparaging, in fact, that the scholarship of our generation will be seen as not trustworthy, or inherently compromised in some way by comparison with what came before (pre-digital) and what will come after (sophisticatedly digital).

Scott tentatively concludes:

For the moment one solution is to read less, but better. This may seem a luddite approach to the problem, but what other choice is there?

First, I should point out that the rest of Scott’s post makes it clear that he’s no Luddite. He understands the advantages of digital scholarship. But I look at this a little differently.

I agree with most of Scott’s description of the current state of digital scholarship and with the inevitability of an ever increasing deluge of scholarly digital material. But, I think the issue is not that the filters won’t be able to keep up with the deluge. Rather, I think we’re just going to have to give up on the idea of “keeping up” — much as newspapers and half hour news broadcasts have to give up the pretense that they are covering all the day’s events. The idea of coverage was always an internalization of the limitation of the old media, as if a newspaper, a broadcast, or even the lifetime of a scholar could embrace everything important there is to know about a field. Now the Net has made clear to us what we knew all along: most of what knowledge wanted to do was a mere dream.

So, for me the question is what scholarship and expertise look like when they cannot attain a sense of mastery by artificial limiting the material with which they have to deal. It was much easier when you only had to read at the pace of the publishers. Now you’d have to read at the pace of the writers…and there are so many more writers! So, lacking a canon, how can there be experts? How can you be a scholar?

I’m bad at predicting the future, and I don’t know if Scott is right that we will eventually develop such powerful search and filtering tools that the current generation of scholars will look betwixt-and-between fools (or as an “asterisk,” as Scott says). There’s an argument that even if the pace of growth slows, the pace of complexification will increase. In any case, I’d guess that deep scholars will continue to exist because that’s more a personality trait than a function of the available materials. For example, I’m currently reading Armies of Heaven, by Jay Rubenstein. The depth of his knowledge about the First Crusade is astounding. Astounding. As more of the works he consulted come on line, other scholars of similar temperament will find it easier to pursue their deep scholarship. They will read less and better not as a tactic but because that’s how the world beckons to them. But the Net will also support scholars who want to read faster and do more connecting. Finally (and to me most interestingly) the Net is already helping us to address the scaling problem by facilitating the move of knowledge from books to networks. Books don’t scale. Networks do. Although, yes, that fundamentally changes the nature of knowledge and scholarship.

[Note: My initial post embedded one draft inside another and was a total mess. Ack. I've cleaned it up - Oct. 26, 2011, 4:03pm edt.]

Tags:

[berkman] [2b2k] Michael Nielsen on the networking of science

Michael Nielsen is giving a Berkman talk on the networking of science. (It’s his first talk after his book Reinventing Discovery was published.)

NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.

He begins by telling the story of Tim Gowers, a Fields Medal winner and blogger. (Four of the 42 living Fields winners have started blogs; two of them are still blogging.) In January 2009, Gowers started posting difficult problems on his blog, and work on the problem in the open. Plus he invited the public to post ideas in the comments. He called this the Polymath Project. 170,000 words in the comments later, ideas had been proposed and rapidly improved or discarded. A few weeks later, the problem had been solved at an even higher level of generalization.

Michael asks: Why isn’t this more common? He gives an example of the failure of an interesting idea. It was proposed by a grad student in 2005. Qwiki was supposed to be a super-textbook about Quantum Mechanics. The site was well built and well marketed. “But science is littered with examples of wikis like this…They are not attracting regular contributors.” Likewise many scientific social networks are ghost towns. “The fundamental problem is one of opportunity costs. If you’re a young scientist, the way you build your career is through the publication of scientific papers…One mediocre crappy paper is going to do more your career than a series of brilliant contributions to a wiki.”

Why then is the Polymath Project succeeding? It just used an unconventional means to a conventional means: they published two papers out of it. Sites like Qwiki that are an end in themselves are not being exploited. We need a “change in norms in scientific culture” so that when people are making decisions about grants and jobs, people who contribute to unconventional formats are rewarded.

How do you achieve a change in the culture. It’s hard. Take the Human Genome project. In the 1990s, there wasn’t not a lot of advantage to individual scientists to share their data. In 1996, the Wellcome Trust held a meeting in Bermuda and agreed on principles that said that if you took more than a thousand base pairs, you need to release it to a public database and be put into the public domain. The funding agencies baked those principles into policy. In April 2000, Clinton and Blair urged all countries to adopt similar principles.

For this to work, you need enthusiastic acceptance, not just a stick beating scientists into submission. You need scientists to internalize it. Why? Because you need all sorts of correlative data to make lab data useful. E.g., Sloane Digital Sky Survey: a huge part of the project was establishing the calibration lines for the data to have meaning to anyone else.

Many scientists are pessimistic about this change occuring. But there’s some hopeful precedents. In 1610 Galileo pointed his telescope at Saturn. He was expecting to see a small disk. But he saw a disk with small knobs on either side — the rings, although he couldn’t resolve the image further. He sent letters to four colleagues, including Kepler that scrambled his discovery into an anagram. This way, if someone else made the discovery, Galileo could unscramble the letters and prove that he had made the discovery first. Leonardo, Newton, Hooks, Hyugens all did this. Scientific journals helped end this practice. The editors of the first journals had trouble convincing scientists to reveal their info because there was no link between publication and career. The editor of the first scientific journal (Philosophical Transactions of the Royal Society) goaded scientists into publishing by writing to them suggesting other scientists were about to disclose what the recipients of the letter were working on. As Paul David [Davis? Couldn't find it via Google] says, the change to the modern system was due to “patron pressure.”

Michael points out that Galileo immediately announced the discovery of four moons of Jupiter in order to get patronage bucks from the Medicis for the right to name them. [Or, as we would do today, The Comcast Moon, the Staples Moon, and the Gosh Honey Your Hair Smells Great Moon.]

Some new ideas: The Journal of Visualized Experiments videotapes lab work, thus revealing tacit knowledge. Geiger Science (from Springer) publishes data sets as first-class objects. Open Research Computation makes code into a first-class object. And blog posts are beginning to show up on Google Scholar (possible because they’re paying attention to tags?). So, if your post is being cited by lots of articles, your post will show up at Scholar.

[in response to a question] A researcher claimed to have solved the P not-P problem. One of the serious mathematicians (Cook) said it was a serious solution. Mathematicians and others tore it apart on the Web to see if it was right. About a week later, the consensus was that there was a serious obstruction, although they salvaged a small lemma. The process leveraged expertise in many different areas — statistical physics, logic, etc.

Q: [me] Science has been a type of publishing. How does scientific knowledge change when it becomes a type of networking?
A: You can see this beginning to happen in various fields. E.g., People at Google talk about their sw as an ecology. [Afterwards, Michael explained that Google developers use a complex ecology of libraries and services with huge numbers of dependencies.] What will it mean when someone says that the Higgs Boson has been found at the LHC? There are millions of lines of code, huge data sets. It will be an example of using networked knowledge to draw a conclusion where no single person has more than a tiny understanding of the chain of inferences that led to this result. How do you do peer review of that paper? Peer review can’t mean that it’s been checked because no one person can check it. No one has all the capability. How do you validate this knowledge? The methods used to validate are completely ad hoc. E.g., International Panel on Climate Change has more data than any one person can evaluate. And they don’t have a method. It’s ad hoc. They do a good job, but it’s ad hoc.

Q: Classification of Finite Groups were the same. A series of papers.
A: Followed by a 1200 word appendix addressing errors.

Q: It varies by science, of course. For practical work, people need access to the data. For theoretical work, the person who makes the single step that solves it should get 98% of the credit. E.g., Newton v. Leibniz on calculus. E.g., Perleman‘s approach to the Poincaré conjecture.
A: Yes. Perelman published three papers on a pre-press server. Afterward, someone published a paper that filled in the gaps, but Perelman’s was the crucial contribution. This is the normal bickering in science. I would like to see many approaches and gradual consensus. You’ll never have perfect agreement. With transparency, you can go back and see how people came to those ideas.

Q: What is validation? There is a fundamental need for change in the statistical algorithms that many data sets are built on. You have to look at those limitations as well as at the data sets.
A: There’s lots of interesting things happening. But I think this is a transient problem. Best practices are still emerging. There are a lot of statisticians on the case. A move toward more reproducible research and more open sharing of code would help. E.g., many random generators are broken, as is well known. Having the random generator code in an open repository makes life much easier.

Q: The P v not-P left a sense that it was a sprint in response to a crisis, but how can it be done in a more scalable way?
A: People go for the most interesting claims.

Q: You mentioned the Bermuda Principles, and NIH requires open access pub one year after paper pub. But you don’t see that elsewhere. What are the sociological reasons?
Peter Suber: There’s a more urgent need for medical research. The campaign for open access at NSF is not as large, and the counter-lobby (publishers of scientific journals) is bigger. But Pres. Obama has said he’s willing to do it by executive order if there’s sufficient public support. No sign of action yet.

Q: [peter suber] I want to see researchers enthusiastic about making their research public. How do we construct a link between OA and career?
A: It’s really interesting what’s going on. A lot of discussion about supporting gold OA (publishing in OA journals, as opposed to putting it into an OA repository). Fundamentally, it comes down to a question of values. Can you create a culture in science that views publishing in gold OA journals as better than publishing in prestigious toll journals. The best way perhaps is to make it a public issue. Make it embarrassing for scientists to lock their work away. The Aaron Swartz case has sparked a public discussion of the role publishers, especially when they’re making 30% profits.
Q: Peter: Whenever you raise the idea of tweaking tenure criteria, you unleash a tsunami of academic conservativism, even if you make clear that this would still support the same rigorous standards. Can we change the reward system without waiting for it to evolve?
A: There was a proposal a few years ago that it be done purely algorithmic: produce a number based on the citation index. If it had been done, simple tweaks to the algorithm would have been an example: “You get a 10% premium for being in a gold OA journal, etc.”
Q: [peter] One idea was that your work wouldn’t be noticed by the tenure committee if it wasn’t in an OA repository.
A: Spiers [??] lets you measure the impact of your pre-press articles, which has had made it easier for people to assess the effect of OA publishing. You see people looking up the Spiers number of a scientist they just met. You see scientists bragging about the number of times their slides have been downloaded via Mendeley.

Q: How can we accelerate by an order of magnitude in the short term?
A: Any tool that becomes widely used to measure impact affects how science is done. E.g., the H Index. But I’d like to see a proliferation of measures because when you only have one, it reduces cognitive diversity.

Q: Before the Web, Erdos was the moving collaborator. He’d go from place to place and force collaboration. Let’s duplicate that on the Net!
A: He worked 18 hours a day, 365 days/year, high on amphetamines. Not sure that’s the model :) He did lots of small projects. When you have a large project, you bring in the expertise you need. Open collaboration has the unpredictable spread of expertise that participates, and that’s often crucial. E.g., Einstein never thought that understanding gravity required understanding non-standard geometries. He learned that from someone else [missed who]. That’s the sort of thing you get in open collaborations.

Q: You have to have a strong ego to put your out-there paper out there to let everyone pick it apart.
A: Yes. I once asked a friend of mine how he consistently writes edgy blog posts. He replied that it’s because there are some posts he genuinely regrets writing. That takes a particular personality type. But the same is true for publishing papers.
Q: But at least you can blame the editors or peer reviewers.
A: But that’s very modern. In the 1960s. Of Einstein’s 300 papers, only one was peer reviewed … and that one was rejected. Newton was terribly anguished by the criticism of his papers. Networked science may exacerbate it, but it’s always been risky to put your ideas out there.

[Loved this talk.]

Tags:

[2b2k] Open bench science

Carl Zimmer at The Loom points to Rosie Redfield’s blogging of her lab work investigating a claim of arsenic-based life forms. It’s a good example of networked science : science that is based on the network model, rather than on a publishing model.

I find open notebook science overall to be fascinating and promising.

Tags:

New flavor of open

Randy Scheckman, the new editor of the Proceedings of the National Academy of Sciences — online and free — explains how the a new journal will work: scientists will edit for scientists, there will be rapid turnaround, and the journal’s acceptance rate for submissions will go way up. He positions it as more scientist-friendly than Public Library of Science.

The fact that this interview was (admirably) published in Science magazine has some significance as well.

(By the way, the authors of a report on obstacles to open access have left a hefty and useful comment on my post.)


In my continued pursuit of never getting anything entirely right, here’s a comment from Michael Jensen: “Not quite right — the new editor of what I think is a still-unnamed OA biomedical journal was announced, but Randy Schekman currently edits the PNAS, as I read it.” I have edited the above to get it righter. Thanks, Michael!

Tags:

What’s stopping open access?

A report on a survey of 350 chemists and 350 economists in UK universities leads to the following conclusion about open access publishing:

…our work with researchers on the ground indicates to us that whatever the enthusiasm and optimism within the OA community, it has not spilled into academia to a large extent and has had only a small effect on the publishing habits and perceptions of ordinary researchers, whatever their seniority and whether in Chemistry or Economics.


The report finds that faculty members want to publish in high “impact factor” journals unless they have some specific reason why they should go the Open Access route, e.g., they need to get something out quickly. The subscriptions their libraries buy mask from them the extent to which their work becomes inaccessible to those who are not a university.

The report ends with some recommendations for trying to move academics towards OA publishing.

Tags:

[lodlam] The rise of Linked Open Data

At the Linked Open Data in Libraries, Archives and Museums conf [LODLAM], Jonathan Rees casually offered what I thought was useful a distinction. (Also note that I am certainly getting this a little wrong, and could possibly be getting it entirely wrong.)


Background: RDF is the basic format of data in the Semantic Web and LOD; it consists of statements of the form “A is in some relation to B.”


My paraphrase: Before LOD, we were trying to build knowledge representations of the various realms of the world. Therefore, it was important that the RDF triples expressed were true statements about the world. In LOD, triples are taken as a way of expressing data; take your internal data, make it accessible as RDF, and let it go into the wild…or, more exactly, into the commons. You’re not trying to represent the world; you’re just trying to represent your data so that it can be reused. It’s a subtle but big difference.


I also like John Wilbanks‘ provocative tweet-length explanation of LOD: “Linked open data is duct tape that some people mistake for infrastructure. Duct tape is awesome.”


Finally, it’s pretty awesome to be at a techie conference where about half the participants are women.

Tags:

How-to guide for moving a journal to Open Access

The Association for Learning Technology has published a detailed and highly practical guide, based on its own experience, for journals moving toward an Open Access model. Indeed, the guide is of even broader utility than that, since it considers the practicalities of moving from an existing contract with publishers for any reason.

ALT’s journal has been renamed Research in Learning Technology, and it will be fully Open Access as of January 2012. (Thanks to Seb Schmoller for the tip.)

Tags: