[Note that this is cross posted at the new Digital Scholarship at Harvard blog.]
NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.
Ralph begins by defining e-research as “Research using digital tools and digital data for the distributed and collaborative production of knowledge.” He points to knowledge as the contentious term. “But we’re going to take a crack at why computational methods are such an important part of knowledge.” They’re going to start with theory and then move to cases.
Over the past couple of decades, we’ve moved from talking about supercomputing to the grid to Web 2.0 to clouds and now Big Data, Ralph says. There is continuity, however: it’s all e-research, and to have a theory of how e-research works, you need a few components: 1. Computational manipulability (mathematization) and 2. The social-technical forces that drive that.
Computational manipulability. This is important because mathematics enables consensus and thus collaboration. “High consensus, rapid discovery.”
Research technologies and driving forces. The key to driving knowledge is research technologies, he says. I.e., machines. You also need an organizational component.
Then you need to look at how that plays out in history, physics, astronomy, etc. Not all fields are organized in the same way.
Eric now talks, beginning with a quote from a scholar who says he now has more information then he needs, all without rooting around in libraries. But others complain that we are not asking new enough questions.
He begins with the Large Hadron Collider. It takes lots of people to build it and then to deal with the data it generates. Physics is usually cited as the epitome of e-research. It is the exemplar of how to do big collaboration, he says.
Distributed computation is a way of engaging citizens in science, he says. E.g. Galaxy Zoo, which engages citizens in classifying galaxies. Citizens have also found new types of galaxies (“green peas”), etc. there. Another example: the Genetic Association Information Network is trying to find the cause of bipolarism. It has now grown into a worldwide collaboration. Another: Structure of Populations, Levels of Abundance, and Status of Humpbacks (SPLASH), a project that requires human brains to match humpback tails. By collaboratively working on data from 500 scientists around the Pacific Rim, patterns of migration have emerged, and it was possible to come up with a count of humpbacks (about 15-17K). We may even be able to find out how long humpbacks live. (It’s a least 120 years because a harpoon head was found in one from a company that went out of business that long ago.)
Ralph looks at e-research in Sweden as an example. They have a major initiative under way trying to combine health data with population data. The Swedes have been doing this for a long time. Each Swede has a unique ID; this requires the trust of the population. The social component that engenders this trust is worth exploring, he says. He points to cases where IP rights have had to be negotiated. He also points to the Pynchon Wiki where experts and the crowd annotate Pynchon’s works. Also, Google Books is a source of research data.
Eric: Has Google taken over scholarly research? 70% of scholars use Google and 66% use Google Scholar. But in the humanities, 59% go to the library. 95% consult peers and experts — they ask people they trust. It’s true in the physical sciences too, he says, although the numbers vary some.
Eric says the digital is still considered a bit dirty as a research tool. If you have too many URLS in your footnotes it looks like you didn’t do any real work, or so people fear.
Ralph: Is e-research old wine in new bottles? Underlying all the different sorts of knowledge is mathematization: a shared symbolic language with which you can do things. You have a physical core that consists of computers around which lots of different scholars can gather. That core has changed over time, but all offer types of computational manipulability. The Pynchon Wiki just needs a server. The LHC needs to be distributed globally across sites with huge computing power. The machines at the core are constantly being refined. Different fields use this power differently, and focus their efforts on using those differences to drive their fields forward. This is true in literature and language as well. These research technologies have become so important since they enable researchers to work across domains. They are like passports across fields.
A scholar who uses this tech may gain social traction. But you also get resistance: “What are these guys doing with computing and Shakespeare?”
What can we do with this knowledge about how knowledge is changing? 1. We can inform funding decisions: What’s been happening in different fields, how they affected by social organizations, etc. 2. We need a multidisciplinary way of understanding e-research as a whole. We need more than case studies, Ralph says. We need to be aiming at developing a shared platform for understanding what’s going on. 3. Every time you use these techniques, you are either disintermediating data (e.g., Galaxy Zoo) or intermediating (biomedicine). 4. Given that it’s all digital, we as outsiders have tremendous opportunities to study it. We can analyze it. Which fields are moving where? Where are projects being funded and how are they being organized? You can map science better than ever. One project took a large chunk of academic journals and looked in real time at who is reading what, in what domain.
This lets us understand knowledge better, so we can work together better across departments and around the globe.
Q: Sometimes you have to take a humanities approach to knowledge. Maybe you need to use some of the old systems investigations tools. Maybe link Twitter to systems thinking.
A: Good point. But caution: I haven’t seen much research on how the next generation is doing research and is learning. We don’t have the good sociology yet to see what difference that makes. Does it fragment their attention? Or is this a good thing?
Q: It’d be useful to know who borrows what books, etc., but there are restrictions in the US. How about in Great Britain?
A: If anything, it’s more restrictive in the UK. In the UK a library can’t even archive a web site without permission.
A: The example I gave of real time tracking was of articles, not books. Maybe someone will track usage at Google Books.
Q: Can you talk about what happens to the experience of interpreting a text when you have so much computer-generated data?
A: In the best cases, it’s both/and. E.g., you can’t read all the 19th century digitized newspapers, but you can compute against it. But you still need to approach it with a thought process about how to interpret it. You need both sets of skills.
A: If someone comes along and says it’s all statistics, the reply is that no one wants to read pure stats. They want to read stats put into words.
Q: There’s a science reader that lets you keep track of which papers are being read.
A: E.g., Mendeley. But it’s a self-selected group who use these tools.
Q: In the physical sciences, the more info that’s out there, it’s hard to tell what’s important.
A: One way to address it is to think about it as a cycle: as a field gets overwhelmed with info, you get tools to concentrate the information. But if you only look at a small piece of knowledge, what are you losing? In some areas, e.g., areas within physics, everyone knows everyone else and what everyone else is doing. Earth sciences is a much broader community.
[Interesting talk. It's orthogonal to my own interests in how knowledge is becoming something that "lives" at the network level, and is thus being redefined. It's interesting to me to see how this look when sliced through at a different angle.]