Archive for category misc

[misc][2b2k] Why ontologies make me nervous

A few days ago there was a Twitter back and forth between two people I deeply respect: Dan Brickley [twitter:danbri] and Ed Summers [twitter:edsu]. It started with Ed responding to a tweet about a brief podcast I did with Kevin Ford [twitter:3windmills], who is on the team working on BibFrame:

After a couple of tweets, Dan tweeted the following:


There followed some agreement that it's often helpful to have apps driving the development of standards. (Kevin agrees with this, and points to BibFrame's process.) But, Dan's comment clarified my understanding of why ontologies make me nervous.

Over the past hundred years or so, we've come to a general recognition that all classifications and categorizations are tools, not representations of The Real Order. The periodic table of the elements is a useful way of organizing information, and manifests real relationships among the elements, but it is not the single "real" way the elements are arranged; if you're an economist or an industrialist, a chart that arranges the elements based on where they exist on our planet might be just as valid. Likewise, Linneaus' classification scheme is useful and manifests some real relationships, but if you're a chef you might have a different way of carving up the animal kingdom. Linneaus chose to organize species based upon visible differences — which might not be the "essential" differences — so that his scheme would be useful to scientists in the field. Although he was sometimes ambiguous about this, he seems not to have thought that he was discerning God's own order. Since Linnaeus we have become much more explicit in our understanding that how we classify depends on what we're trying to accomplish.

For example, a DTD (document type definition) typically is designed not to capture the eternal essence of some type of document, but to make the document more usable by systems that automate the document's production and processing. For example, an industry might agree on a DTD for parts catalogs that specifies that a parts catalog must have an element called "part" and that a part must have a type, part number, length, height, weight, material, and a description, and optionally can note whether it turns clockwise or counterclockwise. Each of these elements would have a standard name (e.g., "part_number," not "part#"). The result is a document that describes parts in a standard way so that a company can receive descriptions from all of its suppliers and automatically build a database of the parts it uses.

A DTD therefore is designed with an eye toward what properties are going to be useful. In some industries, it might include a term that captures how shiny the part is, but if it's a DTD for surgical equipment, that may not be relevant enough to include...although "sanitary_packaging" might be. Likewise, how quickly a bolt transfers heat might seem irrelevant, at least until NASA places an order. In this DTD's are much like forms: You don't put a field for earlobe length in the college application form you're designing.

Ontologies are different. They can try to express the structure of a domain independent of any particular use, so that the widest variety of applications can share data, including apps from domains outside of the one that's been mapped. So, to use Dan's example, your ontology of jobs would note that jobs have employers and workers, that they may have a salary or other form of compensation, that they can be part-time, full-time, seasonal, etc. As an ontology designer, because you're trying to think beyond whatever applications you already can imagine, your aim (often, not always) is to provide the fullest possible set of slots just in case someone sometime needs that info. And you will carefully describe the relationships among the elements so that apps and researchers can use knowledge that is implicit in the model.

The line between DTD's and ontologies is fuzzy. Many ontologies are designed with classes of apps in mind, and some DTD's have tried to be hugely general purpose. My discomfort really comes down to a distrust of the concept of "knowledge representation" that underlies some ontologies (especially earlier ones). The complexity of the relationships among parts will always outstrip our attempts to capture and codify those relationships. Further, knowledge cannot be fully represented because it isn't a thing apart from our continuous invention, discovery, and engagement with it.

What it comes down to is that if you talk about ontologies as knowledge representations I'll mutter something under my breath and change the topic.

Tags:

[annotation][2b2k]Opencast-Matterhorn

Andy Wasklewicz and Jeff Austin from Entwine [twitter:entwinemedia] describe a multi-institutional project to build a platform-agnostic tool for enriching video through note-taking, structured annotations, and sharing. It uses HTML 5, and allows for structured tagging, time-based annotation, and more.

Tags:

[2b2k] What knowledge is losing

Jon Lebkowsky in a discussion of Too Big to Know at The Well asked, “What new roles are emerging that weren’t there before?”

Here’s part of my answer (with a few typos fixed):

- Taxonomies, nomenclatures, classification. Having common ways to refer to things is really helpful. We can make up for them to at least some degree by cross-walking and mapping. It’s always going to be messy. The rise of unique IDs and namespaces is helping a great deal.

- Filters. We used to not worry about filters because all we could get was the filtered product. Now we have to worry about them all the time. But we also now filter forward rather than filter out: When the site TheBrowser.com puts together a front page with 10 items on it from around the Web, all the other items that didn’t make it onto the front page are still fully available; TheBrowser.com has merely shortened the number of clicks it takes to get to its ten.

- Consensus. We used to think that we “all” agreed on some things. We had authorities we “all” trusted. Now we have communities of belief. Links and conversation can help us get past the fragmentation that makes us stupid, but not past all fragmentation.

But we should keep in mind that we’ve lost these old formations to a large degree because they don’t scale, and because they presented themselves to us under false pretenses: they were never as baked into the world as they seemed.

It’s our knowledge now.

Tags:

Why I’ve been quiet

There’s been just so much to do. I’ve been on double deadlines (which, btw, is the direct opposite of double rainbows), while the Library Innovation Lab project for the DPLA beta sprint has been roaring forward. But, as of two minutes ago, I have reached a moment when I can breathe…for a minute.

I turned in the final copy-edited version of Too Big to Know a few minutes ago. The copy editor, Christine Arden, was a dream, finding errors and infelicities at every level of the book. Plus, she occasionally put in a note about something she liked; that matters a lot to me. Anyway, it was due in today and I hit the send button at 5:10.

So, sure, yay and congratulations. But from here on in, the book only gets worse. Let me put it like this: It sure isn’t gonna get any better. It’s a relief to be done, of course, but it is anxiety-making to watch the world change as the book stays the same.

I also was on deadline to submit a Scientific American article, which I did on Monday. I’m excited to have something considered by them. (They can always say no, even though it was their idea, and I’ve been working with a really good editor there.)

As for the Library Innovation Lab, we are doing this amazing project for DPLA that is coming together. There are some gigantic, chewy issues we’ve had to work through, which we have been working with some fantastic people on. If we get this even close to right — and I’m confident we will — it will make some very hard problems look so easy that they’re invisible. It’s going to be cool. I am learning so much watching my colleagues work through these issues at a level I can barely hang on to. And then there are all the fascinating problems of building an app that makes people think it’s easy to navigate through tens of millions of works.

It’s been a busy summer. And despite sending off the two large writing projects that have occupied for me a while, I don’t anticipate it getting any less busy.

Tags: