Archive for category science

[2b2k] Science as social object

An article in published in Science on Thursday, securely locked behind a paywall, paints a mixed picture of science in the age of social media. In “Science, New Media, and the Public,” Dominique Brossard and Dietram A. Scheufele urge action so that science will be judged on its merits as it moves through the Web. That’s a worthy goal, and it’s an excellent article. Still, I read it with a sense that something was askew. I think ultimately it’s something like an old vs. new media disconnect.

The authors begin by noting research that suggests that “online science sources may be helping to narrow knowledge gaps” across educational levels[1]. But all is not rosy. Scientists are going to have “to rethink the interface between the science community and the public.” They point to three reasons.

First, the rise of online media has reduced the amount of time and space given to science coverage by traditional media [2].

Second, the algorithmic prioritizing of stories takes editorial control out of the hands of humans who might make better decisions. The authors point to research that “shows that there are often clear discrepancies between what people search for online, which specific areas are suggested to them by search engines, and what people ultimately find.” The results provided by search engines “may all be linked in a self-reinforcing informational spiral…”[3] This leads them to ask an important question:

Is the World Wide Web opening up a new world of easily accessible scientific information to lay audiences with just a few clicks? Or are we moving toward an online science communication environment in which knowledge gain and opinion formation are increasingly shaped by how search engines present results, direct traffic, and ultimately narrow our informational choices? Critical discussions about these developments have mostly been restricted to the political arena…

Third, we are debating science differently because the Web is social. As an example they point to the fact that “science stories usually…are embedded in a host of cues about their accuracy, importance, or popularity,” from tweets to Facebook “Likes.” “Such cues may add meaning beyond what the author of the original story intended to convey.” The authors cite a recent conference [4] where the tone of online comments turned out to affect how people took the content. For example, an uncivil tone “polarized the views….”

They conclude by saying that we’re just beginning to understand how these Web-based “audience-media interactions” work, but that the opportunity and risk are great, so more research is greatly needed:

Without applied research on how to best communicate science online, we risk creating a future where the dynamics of online communication systems have a stronger impact on public views about science than the specific research that we as scientists are trying to communicate.

I agree with so much of this article, including its call for action, yet it felt odd to me that scientists will be surprised to learn that the Web does not convey scientific information in a balanced and impartial way. You only are surprised by this if you think that the Web is a medium. A medium is that through which content passes. A good medium doesn’t corrupt the content; it conveys signal with a minimum of noise.

But unlike any medium since speech, the Web isn’t a passive channel for the transmission of messages. Messages only move through the Web because we, the people on the Web, find them interesting. For example, I’m moving (infinitesimally, granted) this article by Brossard and Scheufele through the Web because I think some of my friends and readers will find it interesting. If someone who reads this post then tweets about it or about the original article, it will have moved a bit further, but only because someone cared about it. In short, we are the medium, and we don’t move stuff that we think is uninteresting and unimportant. We may move something because it’s so wrong, because we have a clever comment to make about it, or even because we misunderstand it, but without our insertion of ourselves in the form of our interests, it is inert.

So, the “dynamics of online communication systems” are indeed going to have “a stronger impact on public views about science” than the scientific research itself does because those dynamics are what let the research have any impact beyond the scientific community. If scientific research is going to reach beyond those who have a professional interest in it, it necessarily will be tagged with “meaning beyond what the author of the original story intended to convey.” Those meanings are what we make of the message we’re conveying. And what we make of knowledge is the energy that propels it through the new system.

We therefore cannot hope to peel the peer-to-peer commentary from research as it circulates broadly on the Net, not that the Brossard and Scheufele article suggests that. Perhaps the best we can do is educate our children better, and encourage more scientists to dive into the social froth as the place where their research is having its broadest effect.

 


Notes, copied straight from the article:

[1] M. A. Cacciatore, D. A. Scheufele, E. A. Corley, Public Underst. Sci.; 10.1177/0963662512447606 (2012).

[2] C. Russell, in Science and the Media, D. Kennedy, G. Overholser, Eds. (American Academy of Arts and Sciences, Cambridge, MA, 2010), pp. 13–43

[3] P. Ladwig et al., Mater. Today 13, 52 (2010)

[4] P. Ladwig, A. Anderson, abstract, Annual Conference of the Association for Education in Journalism and Mass Communication, St. Louis, MO, August 2011; www.aejmc. com/home/2011/06/ctec-2011-abstracts

Tags:

[eim][2b2k] The DSM — never entirely correct

The American Psychiatric Association has approved its new manual of diagnoses — Diagnostic and Statistical Manual of Mental Disorders — after five years of controversy [nytimes].

For example, it has removed Aspberger’s as a diagnosis, lumping it in with autism, but it has split out hoarding from the more general category of obsessive-compulsive disorder. Lumping and splitting are the two most basic activities of cataloguers and indexers. There are theoretical and practical reasons for sometimes lumping things together and sometimes splitting them, but they also characterize personalities. Some of us are lumpers, and some of us are splitters. And all of us are a bit of each at various times.

The DSM runs into the problems faced by all attempts to classify a field. Attempts to come up with a single classification for a complex domain try to impose an impossible order:

First, there is rarely (ever?) universal agreement about how to divvy up a domain. There are genuine disagreements about which principles of organization ought to be used, and how they apply. Then there are the Lumper vs. the Splitter personalities.

Second, there are political and economic motivations for dividing up the world in particular ways.

Third, taxonomies are tools. There is no one right way to divide up the world, just as there is no one way to cut a piece of plywood and no one right thing to say about the world. It depends what you’re trying to do. DSM has conflicting purposes. For one thing, it affects treatment. For example, the NY Times article notes that the change in the classification of bipolar disease “could ‘medicalize’ frequent temper tantrums,” and during the many years in which the DSM classified homosexuality as a syndrome, therapists were encouraged to treat it as a disease. But that’s not all the DSM is for. It also guides insurance payments, and it affects research.

Given this, do we need the DSM? Maybe for insurance purposes. But not as a statement of where nature’s joints are. In fact, it’s not clear to me that we even need it as a single source to define terms for common reference. After all, biologists don’t agree about how to classify species, but that science seems to be doing just fine. The Encyclopedia of Life takes a really useful approach: each species gets a page, but the site provides multiple taxonomies so that biologists don’t have to agree on how to lump and split all the forms of life on the planet.

If we do need a single diagnostic taxonomy, DSM is making progress in its methodology. It has more publicly entered the fray of argument, it has tried to respond to current thinking, and it is now going to be updated continuously, rather than every 5 years. All to the good.

But the rest of its problems are intrinsic to its very existence. We may need it for some purposes, but it is never going to be fully right…because tools are useful, not true.

Tags:

[2b2k] The moment for science

And one more thing about my previous post: I understand that when Heidegger was writing Being and Time in the 1920s, it was important to try to relax our culture’s commitment to scientific objectivity in order to allow more types of truths to appear – more ways that the world shows itself to us.

Almost a hundred years later, with a brand new medium for knowledge, truth, and disclosure, it is time to re-assert science’s privileged (yet still human and imperfect) position as we try to come to agreement across cultures about what we need to do in order to live together on this earth.

In my opinion.

Tags:

[2b2k] Facts, truths, and meta-knowledge

Last night I gave a talk at the Festival of Science in Genoa (or, as they say in Italy, Genova). I was brought over by Codice Edizioni, the publisher of the just-released Italian version of Too Big to Know (or, as they say in Italy “La Stanza Intelligente” (or as they say in America, “The Smart Room”)). The event was held in the Palazzo Ducale, which ain’t no Elks Club, if you know what I mean. And if you don’t know what I mean, what I mean is that it’s a beautiful, arched, painted-ceiling room that holds 800 people and one intimidated American.

genova - palazzo ducale


After my brief talk, Serena Danna of Corriere della Serra interviewed me. She’s really good. For example, her first question was: If the facts no longer have the ability to settle arguments the way we hoped they would, then what happens to truth?


Yeah, way to pitch the ol’ softballs, Serena!


I wasn’t satisfied with my answer, which had three parts. (1) There are facts. The world is one way and not all the other ways that it isn’t. You are not free to make up your own facts. [Yes, I'm talking to you, Mitt!] (2) The basing of knowledge primarily on facts is a relatively new phenomenon. (3) I explicitly invoked Heidegger’s concept of truth, with a soupçon of pragmatism’s view of truth as a tool intended to serve a purpose.


Meanwhile, I’ve been watching The Heidegger Circle mailing list contort itself trying to understand Heidegger’s views about the world that existed before humans entered the scene. Was there Being? Were there beings? It seems to me that any answer has to begin by saying, “Of course the world existed before we did.” But not everyone on the list is comfortable with a statement that simple. Some seem to think that acknowledging that most basic fact somehow diminishes Heidegger’s analysis of the relation of Being and disclosure. Yo, Heideggerians! The world shows itself to us as independent of us. We were born into it, and it keeps going after we’ve died. If that’s a problem for your philosophy, then your philosophy is a problem. And for all of the problems with Heidegger’s philosophy, that just isn’t one. (To be fair, no one on the list suggests that the existence of the universe depends upon our awareness of it, although some are puzzled about how to maintain Heidegger’s conception of “world” (which does seem to depend on us) with that which survives our awareness of it. Heidegger, after all, offers phenomenological ontology, so there is a question about what Being looks like when there is no one to show itself to.)


So, I wasn’t very happy with what I said about truth last night. I said that I liked Heidegger’s notion that truth is the world showing itself to us, and it shows itself to us differently depending on our projects. I’ve always liked this idea for a few reasons. First, it’s phenomenologically true: the onion shows itself differently whether you’re intending to cook it, whether you’re trying to grow it as a cash crop, whether you’re trying to make yourself cry, whether you’re trying to find something to throw at a bad actor, etc. Second, because truth is the way the world shows itself, Heidegger’s sense contains the crucial acknowledgement that the world exists independently of us. Third, because this sense of truth look at our projects, it contains the crucial acknowledgement that truth is not independent of our involvement in the world (which Heidegger accurately characterizes not with the neutral term “involvement” but as our caring about what happens to us and to our fellow humans). Fourth, this gives us a way of thinking about truth without the correspondence theory’s schizophrenic metaphysics that tells us that we live inside our heads, and our mental images can either match or fail to match external reality.


But Heidegger’s view of truth doesn’t do the job that we want done when we’re trying to settle disagreements. Heidegger observes (correctly in my and everybody’s opinion) that different fields have different methodologies for revealing the truth of the world. He speaks coldly (it seems to me) of science, and warmly of poetry. I’m much hotter on science. Science provides a methodology for letting the world show itself (= truth) that is reproducible precisely so that we can settle disputes. For settling disputes about what the world is like regardless of our view of it, science has priority, just as the legal system has priority for settling disputes over the law.


This matters a lot not just because of the spectacular good that science does, but because the question of truth only arises because we sense that something is hidden from us. Science does not uncover all truths but it uniquely uncovers truths about which we can agree. It allows the world to speak in a way that compels agreement. In that sense, of all the disciplines and methodologies, science is the closest to giving the earth we all share its own authentic voice. That about which science cannot speak in a compelling fashion across all cultures and starting points is simply not subject to scientific analysis. Here the poets and philosophers can speak and should be heard. (And of course the compulsive force science manifests is far from beyond resistance and doubt.)


But, when we are talking about the fragmenting of belief that the Internet facilitates, and the fact that facts no longer settle arguments across those gaps, then it is especially important that we commit to science as the discipline that allows the earth to speak of itself in its most compelling terms.


Finally, I was happy that last night I did manage to say that science provides a model for trying to stay smart on the Internet because it is highly self-aware about what it knows: it does not simply hold on to true statements, but is aware of the methodology that led us to see those statements as true. This type of meta awareness — not just within the realm of science — is crucial for a medium as open as the Internet.

Tags:

Obesity is good for your heart

From TheHeart.org, an article by Lisa Nainggolan:

Gothenburg, Sweden – Further support for the concept of the obesity paradox has come from a large study of patients with acute coronary syndrome (ACS) in the Swedish Coronary Angiography and Angioplasty Registry (SCAAR) [1]. Those who were deemed overweight or obese by body-mass index (BMI) had a lower risk of death after PCI [percutaneous coronary intervention, aka angioplasty] than normal-weight or underweight participants up to three years after hospitalization, report Dr Oskar Angerås (University of Gothenburg, Sweden) and colleagues in their paper, published online September 5, 2012 in the European Heart Journal.

Can confirm. My grandmother in the 1930s was instructed to make sure she fed her husband lots and lots of butter to lubricate his heart after a heart attack. This proved to work extraordinarily well, at least until his next heart attack.

I refer once again to the classic 1999 The Onion headline: Eggs Good for You This Week.

Tags:

[2b2k][eim]Digital curation

I’m at the “Symposium on Digital Curation in the Era of Big Data” held by the Board on Research Data and Information of the National Research Council. These liveblog notes cover (in some sense — I missed some folks, and have done my usual spotty job on the rest) the morning session. (I’m keynoting in the middle of it.)

NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.


Alan Blatecky [pdf] from the National Science Foundation says science is being transformed by Big Data. [I can't see his slides from the panel at front.] He points to the increase in the volume of data, but we haven’t paid enough attention to the longevity of the data. And, he says, some data is centralized (LHC) and some is distributed (genomics). And, our networks are unable to transport large amounts of data [see my post], making where the data is located quite significant. NSF is looking at creating data infrastructures. “Not one big cloud in the sky,” he says. Access, storage, services — how do we make that happen and keep it leading edge? We also need a “suite of policies” suitable for this new environment.


He closes by talking about the Data Web Forum, a new initiative to look at a “top-down governance approach.” He points positively to the IETF’s “rough consensus and running code.” “How do we start doing that in the data world?” How do we get a balanced representation of the community? This is not a regulatory group; everything will be open source, and progress will be through rough consensus. They’ve got some funding from gov’t groups around the world. (Check CNI.org for more info.)


Now Josh Greenberg from the Sloan Foundation. He points to the opportunities presented by aggregated Big Data: the effects on social science, on libraries, etc. But the tools aren’t keeping up with the computational power, so researchers are spending too much time mastering tools, plus it can make reproducibility and provenance trails difficult. Sloan is funding some technical approaches to increasing the trustworthiness of data, including in publishing. But Sloan knows that this is not purely a technical problem. Everyone is talking about data science. Data scientist defined: Someone who knows more about stats than most computer scientists, and can write better code than typical statisticians :) But data science needs to better understand stewardship and curation. What should the workforce look like so that the data-based research holds up over time? The same concerns apply to business decisions based on data analytics. The norms that have served librarians and archivists of physical collections now apply to the world of data. We should be looking at these issues across the boundaries of academics, science, and business. E.g., economics works now rests on data from Web businesses, US Census, etc.

[I couldn't liveblog the next two — Michael and Myron — because I had to leave my computer on the podium. The following are poor summaries.]

Michael Stebbins, Assistant Director for Biotechnology in the Office of Science and Technology Policy in the White House, talked about the Administration’s enthusiasm for Big Data and open access. It’s great to see this degree of enthusiasm coming directly from the White House, especially since Michael is a scientist and has worked for mainstream science publishers.


Myron Gutmann, Ass’t Dir of of the National Science Foundation likewise expressed commitment to open access, and said that there would be an announcement in Spring 2013 that in some ways will respond to the recent UK and EC policies requiring the open publishing of publicly funded research.


After the break, there’s a panel.


Anne Kenney, Dir. of Cornell U. Library, talks about the new emphasis on digital curation and preservation. She traces this back at Cornell to 2006 when an E-Science task force was established. She thinks we now need to focus on e-research, not just e-science. She points to Walters and Skinners “New Roles for New Times: Digital Curation for Preservation.” When it comes to e-research, Anne points to the need for metadata stabilization, harmonizing applications, and collaboration in virtual communities. Within the humanities, she sees more focus on curation, the effect of the teaching environment, and more of a focus on scholarly products (as opposed to the focus on scholarly process, as in the scientific environment).


She points to Youngseek Kim et al. “Education for eScience Professionals“: digital curators need not just subject domain expertise but also project management and data expertise. [There's lots of info on her slides, which I cannot begin to capture.] The report suggests an increasing focus on people-focused skills: project management, bringing communities together.


She very briefly talks about Mary Auckland’s “Re-Skilling for Research” and Williford and Henry, “One Culture: Computationally Intensive Research in the Humanities and Sciences.”


So, what are research libraries doing with this information? The Association of Research Libraries has a jobs announcements database. And Tito Sierra did a study last year analyzing 2011 job postings. He looked at 444 jobs descriptions. 7.4% of the jobs were “newly created or new to the organization.” New mgt level positions were significantly higher, while subject specialist jobs were under-represented.


Anne went through Tito’s data and found 13.5% have “digital” in the title. There were more digital humanities positions than e-science. She posts a lists of the new titles jobs are being given, and they’re digilicious. 55% of those positions call for a library science degree.


Anne concludes: It’s a growth area, with responsibilities more clearly defined in the sciences. There’s growing interest in serving the digital humanists. “Digital curation” is not common in the qualifications nomenclature. MLS or MLIS is not the only path. There’s a lot of interest in post-doctoral positions.


Margarita Gregg of the National Oceanic and Atmospheric Administration, begins by talking about challenges in the era of Big Data. They produce about 15 petabytes of data per year. It’s not just about Big Data, though. They are very concerned with data quality. They can’t preserve all versions of their datasets, and it’s important to keep track of the provenance of that data.


Margarita directs one of NOAA’s data centers that acquires, preserves, assembles, and provides access to marine data. They cannot preserve everything. They need multi-disciplinary people, and they need to figure out how to translate this data into products that people need. In terms of personnel, they need: Data miners, system architects, developers who can translate proprietary formats into open standards, and IP and Digital Rights Management experts so that credit can be given to the people generating the data. Over the next ten years, she sees computer science and information technology becoming the foundations of curation. There is no currently defined job called “digital curator” and that needs to be addressed.


Vicki Ferrini at the Lamont -Doherty Earth Observatory at Columbia University works on data management, metadata, discovery tools, educational materials, best practice guidelines for optimizing acquisition, and more. She points to the increased communication between data consumers and producers.


As data producers, the goal is scientific discovery: data acquisition, reduction, assembly, visualization, integration, and interpretation. And then you have to document the data (= metadata).


Data consumers: They want data discoverability and access. Inceasingly they are concerned with the metadata.


The goal of data providers is to provide acccess, preservation and reuse. They care about data formats, metadata standards, interoperability, the diverse needs of users. [I've abbreviated all these lists because I can't type fast enough.].


At the intersection of these three domains is the data scientist. She refers to this as the “data stewardship continuum” since it spans all three. A data scientist needs to understand the entire life cycle, have domain experience, and have technical knowledge about data systems. “Metadata is key to all of this.” Skills: communication and organization, understanding the cultural aspects of the user communities, people and project management, and a balance between micro- and macro perspectives.


Challenges: Hard to find the right balance between technical skills and content knowledge. Also, data producers are slow to join the digital era. Also, it’s hard to keep up with the tech.


Andy Maltz, Dir. of Science and Technology Council of Academy of Motion Picture Arts and Sciences. AMPA is about arts and sciences, he says, not about The Business.


The Science and Technology Council was formed in 2005. They have lots of data they preserve. They’re trying to build the pipeline for next-generation movie technologists, but they’re falling behind, so they have an internship program and a curriculum initiative. He recommends we read their study The Digital Dilemma. It says that there’s no digital solution that meets film’s requirement to be archived for 100 years at a low cost. It costs $400/yr to archive a film master vs $11,000 to archive a digital master (as of 2006) because of labor costs. [Did I get that right?] He says collaboration is key.


In January they released The Digital Dilemma 2. It found that independent filmmakers, documentarians, and nonprofit audiovisual archives are loosely coupled, widely dispersed communities. This makes collaboration more difficult. The efforts are also poorly funded, and people often lack technical skills. The report recommends the next gen of digital archivists be digital natives. But the real issue is technology obsolescence. “Technology providers must take archival lifetimes into account.” Also system engineers should be taught to consider this.


He highly recommends the Library of Congress’ “The State of Recorded Sound Preservation in the United States,” which rings an alarm bell. He hopes there will be more doctoral work on these issues.


Among his controversial proposals: Require higher math scores for MLS/MLIS students since they tend to score lower than average on that. Also, he says that the new generation of content creators have no curatorial awareness. Executivies and managers need to know that this is a core business function.


Demand side data points: 400 movies/year at 2PB/movie. CNN has 1.5M archived assets, and generates 2,500 new archive objects/wk. YouTube: 72 hours of video uploaded every minute.


Takeways:

  • Show business is a business.

  • Need does not necessarily create demand.

  • The nonprofit AV archive community is poorly organized.

  • Next gen needs to be digital natvies with strong math and sci skills.

  • The next gen of executive leaders needs to understand the importance of this.

  • Digital curation and long-term archiving need a business case.


Q&A


Q: How about linking the monetary value of the metadata to the metadata? That would encourage the generation of metadata.


Q: Weinberger paints a picture of flexible world of flowing data, and now we’re back in the academic, scientific world where you want good data that lasts. I’m torn.


A: Margarita: We need to look how that data are being used. Maybe in some circumstances the quality of the data doesn’t matter. But there are other instances where you’re looking for the highest quality data.


A: [audience] In my industry, one person’s outtakes are another person’s director cuts.


A: Anne: In the library world, we say if a little metadata would be great, a lot of it would be great. We need to step away from trying to capture the most to capturing the most useful (since can’t capture the most). And how do you produce data in a way that’s opened up to future users, as well as being useful for its primary consumers? It’s a very interesting balance that needs to be played. Maybe short-term need is a higher thing and long-term is lower.


A: Vicki: The scientists I work with use discrete data sets, spreadsheets, etc. As we get along we’ll have new ways to check the quality of datasets so we can use the messy data as well.


Q: Citizen curation? E.g., a lot of antiques are curated by being put into people’s attics…Not sure what that might imply as model. Two parallel models?


A: Margarita: We’re going to need to engage anyone who’s interested. We need to incorporate citizen corporation.


Anne: That’s already underway where people have particular interests. E.g., Cornell’s Lab of Ornithology where birders contribute heavily.


Q: What one term will bring people info about this topic?


A: Vicki: There isn’t one term, which speaks to the linked data concept.


Q: How will you recruit people from all walks of life to have the skills you want?


A: Andy: We need to convince people way earlier in the educational process that STEM is cool.


A: Anne: We’ll have to rely to some degree on post-hire education.


Q: My shop produces and integrates lots of data. We need people with domain and computer science skills. They’re more likely to come out of the domains.


A: Vicki: As long as you’re willing to take the step across the boundary, it doesn’t mater which side you start from.


Q: 7 yrs ago in library school, I was told that you need to learn a little programming so that you understand it. I didn’t feel like I had to add a whole other profession on to the one I was studying.

Tags:

[2b2k] Big Data needs Big Pipes

A post by Stacy Higginbotham at GigaOm talks about the problems moving Big Data across the Net so that it can be processed. She draws on an article by Mari Silbey at SmartPlanet. Mari’s example is a telescope being built on Cerro Pachon, a mountain in Chile, that will ship many high-resolution sky photos every day to processing centers in the US.

Stacy discusses several high-speed networks, and the possibility of compressing the data in clever ways. But a person on a mailing list I’m on (who wishes to remain anonymous) pointed to GLIF, the Global Lambda Integrated Facility, which rather surprisingly is not a cover name for a nefarious organization out to slice James Bond in two with a high-energy laser pointer.

The title of its “informational brochure” [pdf] is “Connecting research worldwide with lightpaths,” which helps some. It explains:

GLIF makes use of the cost and capacity advantages offered by optical multiplexing, in order to build an infrastructure that can take advantage of various processing, storage and instrumentation facilities around the world. The aim is to encourage the shared use of resources by eliminating the traditional performance bottlenecks caused by a lack of network capacity.

Multiplexing is the carrying of multiple signals at different wavelengths on a single optical fiber. And these wavelengths are known as … wait for it … lambdas. Boom!

My mailing list buddy says that GLIF provides “100 gigabit optical waves”, which compares favorably to your pathetic earthling (um, American) 3-20 megabit broadband connection,(maybe 50mb if you have FIOS), and he notes that GLIF is available in Chile.

To sum up: 1. Moving Big Data is an issue. 2. We are not at the end of innovating. 3. The bandwidth we think of as “high” in the US is a miserable joke.


By the way, you can hear an uncut interview about Big Data I did a few days ago for Breitband, a German radio program that edited, translated, and broadcast it.

Tags:

[2b2k] The Internet, Science, and Transformations of Knowledge

[Note that this is cross posted at the new Digital Scholarship at Harvard blog.]

Ralph Schroeder and Eric Meyer of the Oxford Internet Institute are giving a talk sponsored by the Harvard Library on Internet, Science, and Transformations of knowledge.

NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.

Ralph begins by defining e-research as “Research using digital tools and digital data for the distributed and collaborative production of knowledge.” He points to knowledge as the contentious term. “But we’re going to take a crack at why computational methods are such an important part of knowledge.” They’re going to start with theory and then move to cases.

Over the past couple of decades, we’ve moved from talking about supercomputing to the grid to Web 2.0 to clouds and now Big Data, Ralph says. There is continuity, however: it’s all e-research, and to have a theory of how e-research works, you need a few components: 1. Computational manipulability (mathematization) and 2. The social-technical forces that drive that.

Computational manipulability. This is important because mathematics enables consensus and thus collaboration. “High consensus, rapid discovery.”

Research technologies and driving forces. The key to driving knowledge is research technologies, he says. I.e., machines. You also need an organizational component.

Then you need to look at how that plays out in history, physics, astronomy, etc. Not all fields are organized in the same way.

Eric now talks, beginning with a quote from a scholar who says he now has more information then he needs, all without rooting around in libraries. But others complain that we are not asking new enough questions.

He begins with the Large Hadron Collider. It takes lots of people to build it and then to deal with the data it generates. Physics is usually cited as the epitome of e-research. It is the exemplar of how to do big collaboration, he says.

Distributed computation is a way of engaging citizens in science, he says. E.g. Galaxy Zoo, which engages citizens in classifying galaxies. Citizens have also found new types of galaxies (“green peas”), etc. there. Another example: the Genetic Association Information Network is trying to find the cause of bipolarism. It has now grown into a worldwide collaboration. Another: Structure of Populations, Levels of Abundance, and Status of Humpbacks (SPLASH), a project that requires human brains to match humpback tails. By collaboratively working on data from 500 scientists around the Pacific Rim, patterns of migration have emerged, and it was possible to come up with a count of humpbacks (about 15-17K). We may even be able to find out how long humpbacks live. (It’s a least 120 years because a harpoon head was found in one from a company that went out of business that long ago.)

Ralph looks at e-research in Sweden as an example. They have a major initiative under way trying to combine health data with population data. The Swedes have been doing this for a long time. Each Swede has a unique ID; this requires the trust of the population. The social component that engenders this trust is worth exploring, he says. He points to cases where IP rights have had to be negotiated. He also points to the Pynchon Wiki where experts and the crowd annotate Pynchon’s works. Also, Google Books is a source of research data.

Eric: Has Google taken over scholarly research? 70% of scholars use Google and 66% use Google Scholar. But in the humanities, 59% go to the library. 95% consult peers and experts — they ask people they trust. It’s true in the physical sciences too, he says, although the numbers vary some.

Eric says the digital is still considered a bit dirty as a research tool. If you have too many URLS in your footnotes it looks like you didn’t do any real work, or so people fear.

Ralph: Is e-research old wine in new bottles? Underlying all the different sorts of knowledge is mathematization: a shared symbolic language with which you can do things. You have a physical core that consists of computers around which lots of different scholars can gather. That core has changed over time, but all offer types of computational manipulability. The Pynchon Wiki just needs a server. The LHC needs to be distributed globally across sites with huge computing power. The machines at the core are constantly being refined. Different fields use this power differently, and focus their efforts on using those differences to drive their fields forward. This is true in literature and language as well. These research technologies have become so important since they enable researchers to work across domains. They are like passports across fields.

A scholar who uses this tech may gain social traction. But you also get resistance: “What are these guys doing with computing and Shakespeare?”

What can we do with this knowledge about how knowledge is changing? 1. We can inform funding decisions: What’s been happening in different fields, how they affected by social organizations, etc. 2. We need a multidisciplinary way of understanding e-research as a whole. We need more than case studies, Ralph says. We need to be aiming at developing a shared platform for understanding what’s going on. 3. Every time you use these techniques, you are either disintermediating data (e.g., Galaxy Zoo) or intermediating (biomedicine). 4. Given that it’s all digital, we as outsiders have tremendous opportunities to study it. We can analyze it. Which fields are moving where? Where are projects being funded and how are they being organized? You can map science better than ever. One project took a large chunk of academic journals and looked in real time at who is reading what, in what domain.

This lets us understand knowledge better, so we can work together better across departments and around the globe.

Q&A

Q: Sometimes you have to take a humanities approach to knowledge. Maybe you need to use some of the old systems investigations tools. Maybe link Twitter to systems thinking.

A: Good point. But caution: I haven’t seen much research on how the next generation is doing research and is learning. We don’t have the good sociology yet to see what difference that makes. Does it fragment their attention? Or is this a good thing?

Q: It’d be useful to know who borrows what books, etc., but there are restrictions in the US. How about in Great Britain?

A: If anything, it’s more restrictive in the UK. In the UK a library can’t even archive a web site without permission.
A: The example I gave of real time tracking was of articles, not books. Maybe someone will track usage at Google Books.

Q: Can you talk about what happens to the experience of interpreting a text when you have so much computer-generated data?

A: In the best cases, it’s both/and. E.g., you can’t read all the 19th century digitized newspapers, but you can compute against it. But you still need to approach it with a thought process about how to interpret it. You need both sets of skills.
A: If someone comes along and says it’s all statistics, the reply is that no one wants to read pure stats. They want to read stats put into words.

Q: There’s a science reader that lets you keep track of which papers are being read.

A: E.g., Mendeley. But it’s a self-selected group who use these tools.

Q: In the physical sciences, the more info that’s out there, it’s hard to tell what’s important.

A: One way to address it is to think about it as a cycle: as a field gets overwhelmed with info, you get tools to concentrate the information. But if you only look at a small piece of knowledge, what are you losing? In some areas, e.g., areas within physics, everyone knows everyone else and what everyone else is doing. Earth sciences is a much broader community.

[Interesting talk. It's orthogonal to my own interests in how knowledge is becoming something that "lives" at the network level, and is thus being redefined. It's interesting to me to see how this look when sliced through at a different angle.]

Tags:

[2b2k] Peter Galison on The Collective Author

Harvard professor Peter Galison (he’s actually one of only 24 University Professors, a special honor) is opening a conference on author attribution in the digital age.

NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.

He points to the vast increase in the number of physicists involved in an experiment, some of which have 3,000 people working on them. This transforms the role of experiments and how physicists relate to one another. “When CERN says in a couple of months that ‘We’ve found the Higgs particle,’ who is the we?”

He says that there has been a “pseudo-I”: A group that functions under the name of a single author. A generation or two ago this was common: The Alvarez Group,” Thorndike Group, ” etc. This is like when the works of a Rembrandt would in fact come from his studio. But there’s also “The Collective Group”: a group that functions without that name — often without even a single lead institution.” This requires “complex internal regulation, governance, collective responsibility, and novel ways of attributing credit.” So, over the past decades physicists have been asked very fundamental questions about how they want to govern. Those 3,000 people have never all met one another; they’re not even in the same country. So, do they stop the accelerator because of the results from one group? Or, when CERN scientists found data suggesting faster than light neutrinos, the team was not unanimous about publishing those results. When the results were reversed, the entire team suffered some reputational damage. “So, the stakes are very high about how these governance, decision-making, and attribution questions get decided.”

He looks back to the 1960s. There were large bubble chambers kept above their boiling point but under pressure. You’d get beautiful images of particles, and these were the iconic images of physics. But these experiments were at a new, industrial scale for physics. After an explosion in 1965, the labs were put under industrial rules and processes. In 1967 Alan Thorndike at Brookhaven responded to these changes in the ethos of being an experimenter. Rarely is the experimenter a single individual, he said. He is a composite. “He might be 3, 5 or 8, possibly as many as 10, 20, or more.” He “may be spread around geographically…He may be epehemral…He is a social phenomenon, varied in form and impossible to define precisely.” But he certainly is not (said Thorndike) a “cloistered scientist working in isolation at his laboratory bench.” The thing that is thinking is a “composite entity.” The tasks are not partitioned in simple ways, the way contractors working on a house partition their tasks. Thorndike is talking about tasks in which “the cognition itself does not occur in one skull.”

By 1983, physicists were colliding beams that moved particles out in all directions. Bigger equipment. More particles. More complexity. Now instead of a dozen or two participants, you have 150 or so. Questions arose about what an author is. In July 1988 one of the Stanford collaborators wrote an internal memo saying that all collaborators ought to be listed as authors alphabetically since “our first priority should be the coherence of the group and the de facto recognition that contributions to a piece of physics are made by all collaborators in different ways.” They decided on a rule that avoided the nightmare of trying to give primacy to some. The memo continues: “For physics papers, all physicist members of the colaboration are authors. In addition, the first published paper should also include the engineers.” [Wolowitz! :)]

In 1990s rules of authorship got more specific. He points to a particular list of seven very specific rules. “It was a big battle.”

In 1997, when you get to projects as large as ATLAS at CERN, the author count goes up to 2,500. This makes it “harder to evaluate the individual contribution when comparing with other fields in science,” according to a report at the time. With experiments of this size, says Peter, the experimenters are the best source of the review of the results.

Conundrums of Authorship: It’s a community and you’re trying to keep it coherent. “You have to keep things from falling apart” along institutional or disciplinary grounds. E.g., the weak neutral current experiment. The collaborators were divided about whether there were such things. They were mockingly accused of proposing “alternating weak neutral currents,” and this cost them reputationally. But, trying to making these experiments speak in one voice can come at a cost. E.g., suppose 1,900 collaborators want to publish, but 600 don’t. If they speak in one voice, that suppresses dissent.

Then there’s also the question of the “identity of physicists while crediting mechanical, cryogenic, electrical engineers, and how to balance with builders and analysts.” E.g., analysts have sometimes claimed credit because they were the first ones to perceive the truth in the data, while others say that the analysts were just dealing with the “icing.”

Peter ends by saying: These questions go down to our understanding of the very nature of science.

Q: What’s the answer?
A: It’s different in different sciences, each of which has its own culture. Some of these cultures are still emerging. It will not be solved once and for all. We should use those cultures to see what part of evaluations are done inside the culture, and which depend on external review. As I said, in many cases the most serious review is done inside where you have access to all the data, the backups, etc. Figuring out how to leverage those sort of reviews could help to provide credit when it’s time to promote people. The question of credit between scientists and engineers/technicians has been debated for hundreds of years. I think we’ve begun to shed some our class anxiety, i.e., the assumption that hand work is not equivalent to head work, etc. A few years ago, some physicists would say that nanotech is engineering, not science; you don’t hear that so much any more. When a Nobel prize in 1983 went to an engineer, it was a harbinger.

Q: Have other scientists learned from the high energy physicists about this?
A: Yes. There are different models. Some big science gets assimilated to a culture that is more like abig engineering process. E.g., there’s no public awareness of the lead designers of the 747 we’ve been flying for 50 years, whereas we know the directors of Hollywood films. Authorship is something we decide. That the 747 has no author but Hunger Games does was not decreed by Heaven. Big plasma physics is treated more like industry, in part because it’s conducted within a secure facility. The astronomers have done many admirable things. I was on a prize committee that give the award to a group because it was a collective activity. Astronomers have been great about distributing data. There’s Galaxy Zoo, and some “zookeepers” have been credited as authors on some papers.

Q: The credits are getting longer on movies as the specializations grow. It’s a similar problem. They tell you how did what in each category. In high energy physics, scientists see becoming too specialized as a bad thing.
A: In the movies many different roles are recognized. And there are questions of distribution of profits, which is not so analogous to physics experiments. Physicists want to think of themselves as physicists, not as sub-specialists. If you are identified as, for example, the person who wrote the Monte Carlo, people may think that you’re “just a coder” and write you off. The first Ph.D. in physics submitted at Harvard was on the Bohr model; the student was told that it was fine but he had to do an experiment because theoretical physics might be great for Europe but not for the US. It’s naive to think that physicists are Da Vinci’s who do everything; the idea of what counts as being a physicist is changing, and that’s a good thing.

[I wanted to ask if (assuming what may not be true) the Internet leads to more of the internal work being done visibly in public, might this change some of the governance since it will be clearer that there is diversity and disagrement within a healthy network of experimenters. Anyway, that was a great talk.]

Tags:

[2b2k] The Net as paradigm

Edward Burman recently sent me a very interesting email in response to my article about the 50th anniversary of Thomas Kuhn’s The Structure of Scientific Revolutions. So I bought his 2003 book Shift!: The Unfolding Internet – Hype, Hope and History (hint: If you buy it from Amazon, check the non-Amazon sellers listed there) which arrived while I was away this week. The book is not very long — 50,000 words or so — but it’s dense with ideas. For example, Edward argues in passing that the Net exploits already-existing trends toward globalization, rather than leading the way to it; he even has a couple of pages on Heidegger’s thinking about the nature of communication. It’s a rich book.

Shift! applies The Structure of Scientific Revolutions to the Internet revolution, wondering what the Internet paradigm will be. The chapters that go through the history of failed attempts to understand the Net — the “pre-paradigms” — are fascinating. Much of Edward’s analysis of business’ inability to grasp the Net mirrors cluetrain‘s themes. (In fact, I had the authorial d-bag reaction of wishing he had referenced Cluetrain…until I realized that Edward probably had the same reaction to my later books which mirror ideas in Shift!) The book is strong in its presentation of Kuhn’s ideas, and has a deep sense of our cultural and philosophical history.

All that would be enough to bring me to recommend the book. But Edward admirably jumps in with a prediction about what the Internet paradigm will be:

This…brings us to the new paradigm, which will condition our private and business lives as the twenty-first century evolves. It is a simple paradigm, and may be expressed in synthetic form in three simple words: ubiquitous invisible connectivity. That is to say, when the technologies, software and devices which enable global connectivity in real time become so ubiquitous that we are completely unaware of their presence…We are simply connected.” [p. 170]

It’s unfair to leave it there since the book then elaborates on this idea in very useful ways. For example, he talks about the concept of “e-business” as being a pre-paradigm, and the actual paradigm being “The network itself becomes the company,” which includes an erosion of hierarchy by networks. But because I’ve just written about Kuhn, I found myself particularly interested in the book’s overall argument that Kuhn gives us a way to understand the Internet. Is there an Internet paradigm shift?

The are two ways to take this.

First, is there a paradigm by which we will come to understand the Internet? Edward argues yes, we are rapidly settling into the paradigmatic understanding of the Net. In fact, he guesses that “the present revolution [will] be completed and the new paradigm of being [will] be in force” in “roughly five to eight years” [p. 175]. He sagely points to three main areas where he thinks there will be sufficient development to enable the new paradigm to take root: the rise of the mobile Internet, the development of productivity tools that “facilitate improvements in the supply chain” and marketing, and “the increased deployment of what have been termed social applications, involving education and the political sphere of national and local government.” [pp. 175-176] Not bad for 2003!

But I’d point to two ways, important to his argument, in which things have not turned out as Edward thought. First, the 5-8 years after the book came out were marked by a continuing series of disruptive Internet developments, including general purpose social networks, Wikipedia, e-books, crowdsourcing, YouTube, open access, open courseware, Khan Academy, etc. etc. I hope it’s obvious that I’m not criticizing Edward for not being prescient enough. The book is pretty much as smart as you can get about these things. My point is that the disruptions just keep coming. The Net is not yet settling down. So we have to ask: Is the Net going to enable continuous disruption and self-transformation? If so will it be captured by a paradigm? (Or, as M. Knight Shyamalan might put it, is disruption the paradigm?)

Second, after listing the three areas of development over the next 5-8 years, the book makes a claim central to the basic formulation of the new paradigm Edward sees emerging: “And, vitally, for thorough implementation [of the paradigm] the three strands must be invisible to the user: ubiquitous and invisible connectivity.” [p. 176] If the invisibility of the paradigm is required for its acceptance, then we are no closer to that event, for the Internet remains perhaps the single most evident aspect of our culture. No other cultural object is mentioned as many times in a single day’s newspaper. The Internet, and the three components the book point to, are more evident to us than ever. (The exception might be innovations in logistics and supply chain management; I’d say Internet marketing remains highly conspicuous.) We’ve never had a technology that so enabled innovation and creativity, but there may well come a time when we stop focusing so much cultural attention on the Internet. We are not close yet.

Even then, we may not end up with a single paradigm of the Internet. It’s really not clear to me that the attendees at ROFLcon have the same Net paradigm as less Internet-besotted youths. Maybe over time we will all settle into a single Internet paradigm, but maybe we won’t. And we might not because the forces that bring about Kuhnian paradigms are not at play when it comes to the Internet. Kuhnian paradigms triumph because disciplines come to us through institutions that accept some practices and ideas as good science; through textbooks that codify those ideas and practices; and through communities of professionals who train and certify the new scientists. The Net lacks all of that. Our understanding of the Net may thus be as diverse as our cultures and sub-cultures, rather than being as uniform and enforced as, say, genetics’ understanding of DNA is.

Second, is the Internet affecting what we might call the general paradigm of our age? Personally, I think the answer is yes, but I wouldn’t use Kuhn to explain this. I think what’s happening — and Edward agrees — is that we are reinterpreting our world through the lens of the Internet. We did this when clocks were invented and the world started to look like a mechanical clockwork. We did this when steam engines made society and then human motivation look like the action of pressures, governors, and ventings. We did this when telegraphs and then telephones made communication look like the encoding of messages passed through a medium. We understand our world through our technologies. I find (for example) Lewis Mumford more helpful here than Kuhn.

Now, it is certainly the case that reinterpreting our world in light of the Net requires us to interpret the Net in the first place. But I’m not convinced we need a Kuhnian paradigm for this. We just need a set of properties we think are central, and I think Edward and I agree that these properties include the abundant and loose connections, the lack of centralized control, the global reach, the ability of everyone (just about) to contribute, the messiness, the scale. That’s why you don’t have to agree about what constitutes a Kuhnian paradigm to find Shift! fascinating, for it helps illuminate the key question: How are the properties of the Internet becoming the properties we see in — or notice as missing from — the world outside the Internet?

Good book.

Tags: