Archive for category science

[2b2k] The Internet, Science, and Transformations of Knowledge

[Note that this is cross posted at the new Digital Scholarship at Harvard blog.]

Ralph Schroeder and Eric Meyer of the Oxford Internet Institute are giving a talk sponsored by the Harvard Library on Internet, Science, and Transformations of knowledge.

NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.

Ralph begins by defining e-research as “Research using digital tools and digital data for the distributed and collaborative production of knowledge.” He points to knowledge as the contentious term. “But we’re going to take a crack at why computational methods are such an important part of knowledge.” They’re going to start with theory and then move to cases.

Over the past couple of decades, we’ve moved from talking about supercomputing to the grid to Web 2.0 to clouds and now Big Data, Ralph says. There is continuity, however: it’s all e-research, and to have a theory of how e-research works, you need a few components: 1. Computational manipulability (mathematization) and 2. The social-technical forces that drive that.

Computational manipulability. This is important because mathematics enables consensus and thus collaboration. “High consensus, rapid discovery.”

Research technologies and driving forces. The key to driving knowledge is research technologies, he says. I.e., machines. You also need an organizational component.

Then you need to look at how that plays out in history, physics, astronomy, etc. Not all fields are organized in the same way.

Eric now talks, beginning with a quote from a scholar who says he now has more information then he needs, all without rooting around in libraries. But others complain that we are not asking new enough questions.

He begins with the Large Hadron Collider. It takes lots of people to build it and then to deal with the data it generates. Physics is usually cited as the epitome of e-research. It is the exemplar of how to do big collaboration, he says.

Distributed computation is a way of engaging citizens in science, he says. E.g. Galaxy Zoo, which engages citizens in classifying galaxies. Citizens have also found new types of galaxies (“green peas”), etc. there. Another example: the Genetic Association Information Network is trying to find the cause of bipolarism. It has now grown into a worldwide collaboration. Another: Structure of Populations, Levels of Abundance, and Status of Humpbacks (SPLASH), a project that requires human brains to match humpback tails. By collaboratively working on data from 500 scientists around the Pacific Rim, patterns of migration have emerged, and it was possible to come up with a count of humpbacks (about 15-17K). We may even be able to find out how long humpbacks live. (It’s a least 120 years because a harpoon head was found in one from a company that went out of business that long ago.)

Ralph looks at e-research in Sweden as an example. They have a major initiative under way trying to combine health data with population data. The Swedes have been doing this for a long time. Each Swede has a unique ID; this requires the trust of the population. The social component that engenders this trust is worth exploring, he says. He points to cases where IP rights have had to be negotiated. He also points to the Pynchon Wiki where experts and the crowd annotate Pynchon’s works. Also, Google Books is a source of research data.

Eric: Has Google taken over scholarly research? 70% of scholars use Google and 66% use Google Scholar. But in the humanities, 59% go to the library. 95% consult peers and experts — they ask people they trust. It’s true in the physical sciences too, he says, although the numbers vary some.

Eric says the digital is still considered a bit dirty as a research tool. If you have too many URLS in your footnotes it looks like you didn’t do any real work, or so people fear.

Ralph: Is e-research old wine in new bottles? Underlying all the different sorts of knowledge is mathematization: a shared symbolic language with which you can do things. You have a physical core that consists of computers around which lots of different scholars can gather. That core has changed over time, but all offer types of computational manipulability. The Pynchon Wiki just needs a server. The LHC needs to be distributed globally across sites with huge computing power. The machines at the core are constantly being refined. Different fields use this power differently, and focus their efforts on using those differences to drive their fields forward. This is true in literature and language as well. These research technologies have become so important since they enable researchers to work across domains. They are like passports across fields.

A scholar who uses this tech may gain social traction. But you also get resistance: “What are these guys doing with computing and Shakespeare?”

What can we do with this knowledge about how knowledge is changing? 1. We can inform funding decisions: What’s been happening in different fields, how they affected by social organizations, etc. 2. We need a multidisciplinary way of understanding e-research as a whole. We need more than case studies, Ralph says. We need to be aiming at developing a shared platform for understanding what’s going on. 3. Every time you use these techniques, you are either disintermediating data (e.g., Galaxy Zoo) or intermediating (biomedicine). 4. Given that it’s all digital, we as outsiders have tremendous opportunities to study it. We can analyze it. Which fields are moving where? Where are projects being funded and how are they being organized? You can map science better than ever. One project took a large chunk of academic journals and looked in real time at who is reading what, in what domain.

This lets us understand knowledge better, so we can work together better across departments and around the globe.

Q&A

Q: Sometimes you have to take a humanities approach to knowledge. Maybe you need to use some of the old systems investigations tools. Maybe link Twitter to systems thinking.

A: Good point. But caution: I haven’t seen much research on how the next generation is doing research and is learning. We don’t have the good sociology yet to see what difference that makes. Does it fragment their attention? Or is this a good thing?

Q: It’d be useful to know who borrows what books, etc., but there are restrictions in the US. How about in Great Britain?

A: If anything, it’s more restrictive in the UK. In the UK a library can’t even archive a web site without permission.
A: The example I gave of real time tracking was of articles, not books. Maybe someone will track usage at Google Books.

Q: Can you talk about what happens to the experience of interpreting a text when you have so much computer-generated data?

A: In the best cases, it’s both/and. E.g., you can’t read all the 19th century digitized newspapers, but you can compute against it. But you still need to approach it with a thought process about how to interpret it. You need both sets of skills.
A: If someone comes along and says it’s all statistics, the reply is that no one wants to read pure stats. They want to read stats put into words.

Q: There’s a science reader that lets you keep track of which papers are being read.

A: E.g., Mendeley. But it’s a self-selected group who use these tools.

Q: In the physical sciences, the more info that’s out there, it’s hard to tell what’s important.

A: One way to address it is to think about it as a cycle: as a field gets overwhelmed with info, you get tools to concentrate the information. But if you only look at a small piece of knowledge, what are you losing? In some areas, e.g., areas within physics, everyone knows everyone else and what everyone else is doing. Earth sciences is a much broader community.

[Interesting talk. It's orthogonal to my own interests in how knowledge is becoming something that "lives" at the network level, and is thus being redefined. It's interesting to me to see how this look when sliced through at a different angle.]

Tags:

[2b2k] Peter Galison on The Collective Author

Harvard professor Peter Galison (he’s actually one of only 24 University Professors, a special honor) is opening a conference on author attribution in the digital age.

NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.

He points to the vast increase in the number of physicists involved in an experiment, some of which have 3,000 people working on them. This transforms the role of experiments and how physicists relate to one another. “When CERN says in a couple of months that ‘We’ve found the Higgs particle,’ who is the we?”

He says that there has been a “pseudo-I”: A group that functions under the name of a single author. A generation or two ago this was common: The Alvarez Group,” Thorndike Group, ” etc. This is like when the works of a Rembrandt would in fact come from his studio. But there’s also “The Collective Group”: a group that functions without that name — often without even a single lead institution.” This requires “complex internal regulation, governance, collective responsibility, and novel ways of attributing credit.” So, over the past decades physicists have been asked very fundamental questions about how they want to govern. Those 3,000 people have never all met one another; they’re not even in the same country. So, do they stop the accelerator because of the results from one group? Or, when CERN scientists found data suggesting faster than light neutrinos, the team was not unanimous about publishing those results. When the results were reversed, the entire team suffered some reputational damage. “So, the stakes are very high about how these governance, decision-making, and attribution questions get decided.”

He looks back to the 1960s. There were large bubble chambers kept above their boiling point but under pressure. You’d get beautiful images of particles, and these were the iconic images of physics. But these experiments were at a new, industrial scale for physics. After an explosion in 1965, the labs were put under industrial rules and processes. In 1967 Alan Thorndike at Brookhaven responded to these changes in the ethos of being an experimenter. Rarely is the experimenter a single individual, he said. He is a composite. “He might be 3, 5 or 8, possibly as many as 10, 20, or more.” He “may be spread around geographically…He may be epehemral…He is a social phenomenon, varied in form and impossible to define precisely.” But he certainly is not (said Thorndike) a “cloistered scientist working in isolation at his laboratory bench.” The thing that is thinking is a “composite entity.” The tasks are not partitioned in simple ways, the way contractors working on a house partition their tasks. Thorndike is talking about tasks in which “the cognition itself does not occur in one skull.”

By 1983, physicists were colliding beams that moved particles out in all directions. Bigger equipment. More particles. More complexity. Now instead of a dozen or two participants, you have 150 or so. Questions arose about what an author is. In July 1988 one of the Stanford collaborators wrote an internal memo saying that all collaborators ought to be listed as authors alphabetically since “our first priority should be the coherence of the group and the de facto recognition that contributions to a piece of physics are made by all collaborators in different ways.” They decided on a rule that avoided the nightmare of trying to give primacy to some. The memo continues: “For physics papers, all physicist members of the colaboration are authors. In addition, the first published paper should also include the engineers.” [Wolowitz! :)]

In 1990s rules of authorship got more specific. He points to a particular list of seven very specific rules. “It was a big battle.”

In 1997, when you get to projects as large as ATLAS at CERN, the author count goes up to 2,500. This makes it “harder to evaluate the individual contribution when comparing with other fields in science,” according to a report at the time. With experiments of this size, says Peter, the experimenters are the best source of the review of the results.

Conundrums of Authorship: It’s a community and you’re trying to keep it coherent. “You have to keep things from falling apart” along institutional or disciplinary grounds. E.g., the weak neutral current experiment. The collaborators were divided about whether there were such things. They were mockingly accused of proposing “alternating weak neutral currents,” and this cost them reputationally. But, trying to making these experiments speak in one voice can come at a cost. E.g., suppose 1,900 collaborators want to publish, but 600 don’t. If they speak in one voice, that suppresses dissent.

Then there’s also the question of the “identity of physicists while crediting mechanical, cryogenic, electrical engineers, and how to balance with builders and analysts.” E.g., analysts have sometimes claimed credit because they were the first ones to perceive the truth in the data, while others say that the analysts were just dealing with the “icing.”

Peter ends by saying: These questions go down to our understanding of the very nature of science.

Q: What’s the answer?
A: It’s different in different sciences, each of which has its own culture. Some of these cultures are still emerging. It will not be solved once and for all. We should use those cultures to see what part of evaluations are done inside the culture, and which depend on external review. As I said, in many cases the most serious review is done inside where you have access to all the data, the backups, etc. Figuring out how to leverage those sort of reviews could help to provide credit when it’s time to promote people. The question of credit between scientists and engineers/technicians has been debated for hundreds of years. I think we’ve begun to shed some our class anxiety, i.e., the assumption that hand work is not equivalent to head work, etc. A few years ago, some physicists would say that nanotech is engineering, not science; you don’t hear that so much any more. When a Nobel prize in 1983 went to an engineer, it was a harbinger.

Q: Have other scientists learned from the high energy physicists about this?
A: Yes. There are different models. Some big science gets assimilated to a culture that is more like abig engineering process. E.g., there’s no public awareness of the lead designers of the 747 we’ve been flying for 50 years, whereas we know the directors of Hollywood films. Authorship is something we decide. That the 747 has no author but Hunger Games does was not decreed by Heaven. Big plasma physics is treated more like industry, in part because it’s conducted within a secure facility. The astronomers have done many admirable things. I was on a prize committee that give the award to a group because it was a collective activity. Astronomers have been great about distributing data. There’s Galaxy Zoo, and some “zookeepers” have been credited as authors on some papers.

Q: The credits are getting longer on movies as the specializations grow. It’s a similar problem. They tell you how did what in each category. In high energy physics, scientists see becoming too specialized as a bad thing.
A: In the movies many different roles are recognized. And there are questions of distribution of profits, which is not so analogous to physics experiments. Physicists want to think of themselves as physicists, not as sub-specialists. If you are identified as, for example, the person who wrote the Monte Carlo, people may think that you’re “just a coder” and write you off. The first Ph.D. in physics submitted at Harvard was on the Bohr model; the student was told that it was fine but he had to do an experiment because theoretical physics might be great for Europe but not for the US. It’s naive to think that physicists are Da Vinci’s who do everything; the idea of what counts as being a physicist is changing, and that’s a good thing.

[I wanted to ask if (assuming what may not be true) the Internet leads to more of the internal work being done visibly in public, might this change some of the governance since it will be clearer that there is diversity and disagrement within a healthy network of experimenters. Anyway, that was a great talk.]

Tags:

[2b2k] The Net as paradigm

Edward Burman recently sent me a very interesting email in response to my article about the 50th anniversary of Thomas Kuhn’s The Structure of Scientific Revolutions. So I bought his 2003 book Shift!: The Unfolding Internet – Hype, Hope and History (hint: If you buy it from Amazon, check the non-Amazon sellers listed there) which arrived while I was away this week. The book is not very long — 50,000 words or so — but it’s dense with ideas. For example, Edward argues in passing that the Net exploits already-existing trends toward globalization, rather than leading the way to it; he even has a couple of pages on Heidegger’s thinking about the nature of communication. It’s a rich book.

Shift! applies The Structure of Scientific Revolutions to the Internet revolution, wondering what the Internet paradigm will be. The chapters that go through the history of failed attempts to understand the Net — the “pre-paradigms” — are fascinating. Much of Edward’s analysis of business’ inability to grasp the Net mirrors cluetrain‘s themes. (In fact, I had the authorial d-bag reaction of wishing he had referenced Cluetrain…until I realized that Edward probably had the same reaction to my later books which mirror ideas in Shift!) The book is strong in its presentation of Kuhn’s ideas, and has a deep sense of our cultural and philosophical history.

All that would be enough to bring me to recommend the book. But Edward admirably jumps in with a prediction about what the Internet paradigm will be:

This…brings us to the new paradigm, which will condition our private and business lives as the twenty-first century evolves. It is a simple paradigm, and may be expressed in synthetic form in three simple words: ubiquitous invisible connectivity. That is to say, when the technologies, software and devices which enable global connectivity in real time become so ubiquitous that we are completely unaware of their presence…We are simply connected.” [p. 170]

It’s unfair to leave it there since the book then elaborates on this idea in very useful ways. For example, he talks about the concept of “e-business” as being a pre-paradigm, and the actual paradigm being “The network itself becomes the company,” which includes an erosion of hierarchy by networks. But because I’ve just written about Kuhn, I found myself particularly interested in the book’s overall argument that Kuhn gives us a way to understand the Internet. Is there an Internet paradigm shift?

The are two ways to take this.

First, is there a paradigm by which we will come to understand the Internet? Edward argues yes, we are rapidly settling into the paradigmatic understanding of the Net. In fact, he guesses that “the present revolution [will] be completed and the new paradigm of being [will] be in force” in “roughly five to eight years” [p. 175]. He sagely points to three main areas where he thinks there will be sufficient development to enable the new paradigm to take root: the rise of the mobile Internet, the development of productivity tools that “facilitate improvements in the supply chain” and marketing, and “the increased deployment of what have been termed social applications, involving education and the political sphere of national and local government.” [pp. 175-176] Not bad for 2003!

But I’d point to two ways, important to his argument, in which things have not turned out as Edward thought. First, the 5-8 years after the book came out were marked by a continuing series of disruptive Internet developments, including general purpose social networks, Wikipedia, e-books, crowdsourcing, YouTube, open access, open courseware, Khan Academy, etc. etc. I hope it’s obvious that I’m not criticizing Edward for not being prescient enough. The book is pretty much as smart as you can get about these things. My point is that the disruptions just keep coming. The Net is not yet settling down. So we have to ask: Is the Net going to enable continuous disruption and self-transformation? If so will it be captured by a paradigm? (Or, as M. Knight Shyamalan might put it, is disruption the paradigm?)

Second, after listing the three areas of development over the next 5-8 years, the book makes a claim central to the basic formulation of the new paradigm Edward sees emerging: “And, vitally, for thorough implementation [of the paradigm] the three strands must be invisible to the user: ubiquitous and invisible connectivity.” [p. 176] If the invisibility of the paradigm is required for its acceptance, then we are no closer to that event, for the Internet remains perhaps the single most evident aspect of our culture. No other cultural object is mentioned as many times in a single day’s newspaper. The Internet, and the three components the book point to, are more evident to us than ever. (The exception might be innovations in logistics and supply chain management; I’d say Internet marketing remains highly conspicuous.) We’ve never had a technology that so enabled innovation and creativity, but there may well come a time when we stop focusing so much cultural attention on the Internet. We are not close yet.

Even then, we may not end up with a single paradigm of the Internet. It’s really not clear to me that the attendees at ROFLcon have the same Net paradigm as less Internet-besotted youths. Maybe over time we will all settle into a single Internet paradigm, but maybe we won’t. And we might not because the forces that bring about Kuhnian paradigms are not at play when it comes to the Internet. Kuhnian paradigms triumph because disciplines come to us through institutions that accept some practices and ideas as good science; through textbooks that codify those ideas and practices; and through communities of professionals who train and certify the new scientists. The Net lacks all of that. Our understanding of the Net may thus be as diverse as our cultures and sub-cultures, rather than being as uniform and enforced as, say, genetics’ understanding of DNA is.

Second, is the Internet affecting what we might call the general paradigm of our age? Personally, I think the answer is yes, but I wouldn’t use Kuhn to explain this. I think what’s happening — and Edward agrees — is that we are reinterpreting our world through the lens of the Internet. We did this when clocks were invented and the world started to look like a mechanical clockwork. We did this when steam engines made society and then human motivation look like the action of pressures, governors, and ventings. We did this when telegraphs and then telephones made communication look like the encoding of messages passed through a medium. We understand our world through our technologies. I find (for example) Lewis Mumford more helpful here than Kuhn.

Now, it is certainly the case that reinterpreting our world in light of the Net requires us to interpret the Net in the first place. But I’m not convinced we need a Kuhnian paradigm for this. We just need a set of properties we think are central, and I think Edward and I agree that these properties include the abundant and loose connections, the lack of centralized control, the global reach, the ability of everyone (just about) to contribute, the messiness, the scale. That’s why you don’t have to agree about what constitutes a Kuhnian paradigm to find Shift! fascinating, for it helps illuminate the key question: How are the properties of the Internet becoming the properties we see in — or notice as missing from — the world outside the Internet?

Good book.

Tags:

[2b2k] Pyramid-shaped publishing model results in cheating on science?

Carl Zimmer has a fascinating article in the NYTimes, which is worth 1/10th of your NYT allotment. (Thank you for ironically illustrating the problem with trying to maintain knowledge as a scarce resource, NYT!)

Carl reports on what may be a growing phenomenon (or perhaps, as the article suggests, the bugs of the old system may just now be more apparent) of scientists fudging results in order to get published in the top journals. From my perspective the article provides yet another illustration how the old paper-based strictures on scientific knowledge caused by the scarcity of publishing outlets results not only in a reduction in the flow of knowledge, but a degradation of the quality of knowledge.

Unfortunately, the availability of online journals (many of which are peer-reviewed) may not reduce the problem much even though they open up the ol’ knowledge nozzle to 11 on the firehosedial. As we saw when the blogosphere first emerged, there is something like a natural tendency for networked ecosystems to create hubs with a lot of traffic, along with a very long tail. So, even with higher capacity hubs, there may still be some pressure to fudge results in order to get noticed by these hubs, especially since tenure decisions continue to place such high value on a narrow understanding of “impact.”

But: 1. With a larger aperture, there may be less pressure. 2. When readers are also commentators and raters, bad science may be uncovered faster and more often. Or so we can hope.

(There is the very beginnings of a Reddit discussion of Carl’s article here.)

Tags:

[2b2] Structure of Scientific Revolutions, 50 years later

The Chronicle of Higher Ed asked me to write a perspective on Thomas Kuhn’s The Structure of Scientific Revolutions since this is the 50th year since it was published. It’s now posted.

Tags:

[2b2k] Astounding two-minute video edit from NASA’s Cassini and Voyager missions – Only if you love Saturn, Jupiter, and, you know, the Universe

Outer Space from Sander van den Berg on Vimeo.

Tags:

[2b2k] TIL: Edward Jenner’s smallpox paper was rejected by the Royal Society

Edward Jenner is credited as the discoverer — or perhaps inventor would be the more apt word — of vaccination as a technique to prevent smallpox. That’s pretty much all that I knew, except for the story about milkmaids who got cowpox not getting smallpox. But I just read a really interesting article about the history of small pox at the National Institute of Health, by Stefan Riedel.

“TIL” is Reddit-speak for “Today I learned.” And today I also learned that “As early as 430 BC, survivors of smallpox were called upon to nurse the afflicted” in order to protect them. Today I also learned that “Inoculation…was likely practiced in Africa, India, and China long before the 18th century, when it was introduced to Europe.” And today I also learned that “It was the continued advocacy of the English aristocrat Lady Mary Wortley Montague that was responsible for the introduction of variolation [inoculation] in England.”

Tags:

[2b2k] TIL: Edward Jenner’s smallpox paper was rejected by the Royal Society

Edward Jenner is credited as the discoverer — or perhaps inventor would be the more apt word — of vaccination as a technique to prevent smallpox. That’s pretty much all that I knew, except for the story about milkmaids who got cowpox not getting smallpox. But I just read a really interesting article about the history of small pox at the National Institute of Health, by Stefan Riedel.

“TIL” is Reddit-speak for “Today I learned.” And today I also learned that “As early as 430 BC, survivors of smallpox were called upon to nurse the afflicted” in order to protect them. Today I also learned that “Inoculation…was likely practiced in Africa, India, and China long before the 18th century, when it was introduced to Europe.” And today I also learned that “It was the continued advocacy of the English aristocrat Lady Mary Wortley Montague that was responsible for the introduction of variolation [inoculation] in England.”

Tags:

[2b2k] The next Darwin is a we

Sebastian Benthall has a ">fervent post about the need for open networks in science, inspired by an awesome talk by the awesome Victoria Stodden.

Along the way, he offers a correction (or extension, perhaps) of a point that I make in 2b2k: the next Darwin is likely to develop her work within an open network that add values to her work. In some real sense the knowledge lives in that network. Sebastian responds:

He’s right, except maybe for one thing, which is that this digital dialectic (or pluralectic) implies that “the next Darwin” isn’t just one dude, Darwin, with his own ‘-ism’ and pernicious Social adherents. Rather, it means that the next great theory of the origin of species is going to be built by a massive collaborative effort in which lots of people will take an active part. The historical record will show their contributions not just with the clumsy granularity of conference publications and citations, but with minute granularity of thousands of traced conversations. The theory itself will probably be too complicated for any one person to understand, but that’s OK, because it will be well architected and there will be plenty of domain experts to go to if anyone has problems with any particular part of it. And it will be growing all the time and maybe competing with a few other theories.

I love the point.

(Nit: I want to clarify, however, that I wasn’t saying that this next Darwin’s web would consist only of “pernicious Social adherents.” Throughout 2b2k I try to make the point that networked knowledge has value mainly because it includes difference and disagreement. When it does not, it fulfills the nightmare of the echo chamber.)

Tags:

[2b2k] The corruption of impact

According to a survey publishsed in Science [abstract][Slashdot] scientists are routinely pressured to include superfluous references in their papers in order to boost the Impact Factor of the journal publishing their paper. The Impact Factor is (roughly) a measure of the importance/influence of a journal, based on a two year average of how often its papers are cited. Careeers are made by publishing in high Impact Factor journals.

This sort of corruption (which I talk about a bit in Too Big to Know) might seem like an inevitable imprecision in how we gauge something as vague as “infuence” if alternatives were not becoming available. Services like Mendeley can provide real-time readouts of which articles are being read and commented on. Google likewise can see how often articles are being linked to. Facebook can see how articles are being passed around social networks, some of which are quite expert. It would of course be good to have measures not gated by commercial entities. In any case, institutions of knowledge are currently relying upon an instrument that was always too blunt and now known to be corrupt.

Tags: