Apple e-books decision explained – mainstream and Reddit

A judge has ruled that Apple is guilty of price-fixing in its attempt to get the major publishers to unite against Amazon’s discounting of e-books.

Now, that’s not a very helpful — and possibly not entirely accurate — explanation. If you want more, there’s a thread at Reddit that has some terrific explanations at various level of detail (e.g., this one), as well as bunches of questions asked and answered. And, of course, some digressions, hip shots, and smug wrongnesses.

There are certainly some helpful analyses and explanations from the mainstream: e.g., WSJ, Wired, Bloomberg. In fact, I’d be hard-pressed to choose among those three and the Reddit comment I linked to above. But the Reddit thread is — at least to my taste — a better way to explore the issue: a variety of views expressed at appropriate lengths, with questions posed at various levels of sophistication, and with a conversation that goes where it wants to without a fear of dead ends.

Now, I’m aware that if you go to the Reddit thread, you’ll be appalled by how much there is wrong with it. Yeah, I’m not blind to it. But consider what an amazing emergent artifact that thread is. It combines in one flow “explainers” and analysis as good as you’ll find from professionals, Q&A, and a a social froth that you can easily ignore if it is not to your liking. This is what journalism looks like — one of the ways it looks — when the old constraints of space, authorial ownership, and editorial process are lifted, and a larger We gets our hands on it. Pretty fascinating.

Categories: journalism, reddit

Tags:

No Comments

[misc][2b2k] Why ontologies make me nervous

A few days ago there was a Twitter back and forth between two people I deeply respect: Dan Brickley [twitter:danbri] and Ed Summers [twitter:edsu]. It started with Ed responding to a tweet about a brief podcast I did with Kevin Ford [twitter:3windmills], who is on the team working on BibFrame:

After a couple of tweets, Dan tweeted the following:


There followed some agreement that it's often helpful to have apps driving the development of standards. (Kevin agrees with this, and points to BibFrame's process.) But, Dan's comment clarified my understanding of why ontologies make me nervous.

Over the past hundred years or so, we've come to a general recognition that all classifications and categorizations are tools, not representations of The Real Order. The periodic table of the elements is a useful way of organizing information, and manifests real relationships among the elements, but it is not the single "real" way the elements are arranged; if you're an economist or an industrialist, a chart that arranges the elements based on where they exist on our planet might be just as valid. Likewise, Linneaus' classification scheme is useful and manifests some real relationships, but if you're a chef you might have a different way of carving up the animal kingdom. Linneaus chose to organize species based upon visible differences — which might not be the "essential" differences — so that his scheme would be useful to scientists in the field. Although he was sometimes ambiguous about this, he seems not to have thought that he was discerning God's own order. Since Linnaeus we have become much more explicit in our understanding that how we classify depends on what we're trying to accomplish.

For example, a DTD (document type definition) typically is designed not to capture the eternal essence of some type of document, but to make the document more usable by systems that automate the document's production and processing. For example, an industry might agree on a DTD for parts catalogs that specifies that a parts catalog must have an element called "part" and that a part must have a type, part number, length, height, weight, material, and a description, and optionally can note whether it turns clockwise or counterclockwise. Each of these elements would have a standard name (e.g., "part_number," not "part#"). The result is a document that describes parts in a standard way so that a company can receive descriptions from all of its suppliers and automatically build a database of the parts it uses.

A DTD therefore is designed with an eye toward what properties are going to be useful. In some industries, it might include a term that captures how shiny the part is, but if it's a DTD for surgical equipment, that may not be relevant enough to include...although "sanitary_packaging" might be. Likewise, how quickly a bolt transfers heat might seem irrelevant, at least until NASA places an order. In this DTD's are much like forms: You don't put a field for earlobe length in the college application form you're designing.

Ontologies are different. They can try to express the structure of a domain independent of any particular use, so that the widest variety of applications can share data, including apps from domains outside of the one that's been mapped. So, to use Dan's example, your ontology of jobs would note that jobs have employers and workers, that they may have a salary or other form of compensation, that they can be part-time, full-time, seasonal, etc. As an ontology designer, because you're trying to think beyond whatever applications you already can imagine, your aim (often, not always) is to provide the fullest possible set of slots just in case someone sometime needs that info. And you will carefully describe the relationships among the elements so that apps and researchers can use knowledge that is implicit in the model.

The line between DTD's and ontologies is fuzzy. Many ontologies are designed with classes of apps in mind, and some DTD's have tried to be hugely general purpose. My discomfort really comes down to a distrust of the concept of "knowledge representation" that underlies some ontologies (especially earlier ones). The complexity of the relationships among parts will always outstrip our attempts to capture and codify those relationships. Further, knowledge cannot be fully represented because it isn't a thing apart from our continuous invention, discovery, and engagement with it.

What it comes down to is that if you talk about ontologies as knowledge representations I'll mutter something under my breath and change the topic.

Categories: dtd, everythingismisc, misc, ontologies, sgml

Tags:

No Comments

[2b2k] Knowledge in its natural state

I gave a 20 minute talk at the Wired Next Fest in Milan on June 1, 2013. Because I needed to keep the talk to its allotted time and because it was being simultaneously translated into Italian, I wrote it out and gave a copy to the translators. Inevitably, I veered from the script a bit, but not all that much. What follows is the script with the veerings that I can remember. The paragraph breaks track to the slide changes

(I began by thanking the festival, and my progressive Italian publisher, Codice Edizioni Codice are pragmatic idealists and have been fantastic to work with.)

Knowledge seems to fit so perfectly into books. But to marvel at how well Knowledge fits into books…

… is to marvel at how well each rock fits into its hole in the ground. Knowledge fits books because we’ve shaped knowledge around books and paper.

And knowledge has taken on the properties of books and paper. Like books, knowledge is ordered and orderly. It is bounded, just as books stretch from cover to cover. It is the product of an individual mind that then is filtered. It is kept private and we’re not responsible for it until it’s published. Once published, it cannot be undone. It creates a privileged class of experts, like the privileged books that are chosen to be published and then chosen to be in a library

Released from the bounds of paper, knowledge takes on the shape of its new medium, the Internet. It takes on the properties of its new medium just it had taken on the properties of its old paper medium. It’s my argument today that networked knowledge assumes a more natural shape. Here are some of the properties of new, networked knowledge

1. First, because it’s a network, it’s linked.

2. These links have no natural stopping point for your travels. If anything, the network gives you temptations to continue, not stopping points.

3. And, like the Net, it’s too big for any one head, Michael Nielsen, the author of Reinventing Discovery, uses the discovery of the Higgs Boson as an example. That discovery required gigantic networks of equipment and vast networks of people. There is no one person who understands everything about the system that proved that that particle exists. That knowledge lives in the system, in the network.

4. Like the net, networked knowledge is in perpetual disagreement. There is nothing about which everyone agrees. We like to believe this is a temporary state, but after thousands of years of recorded history, we can now see for sure that we are never going to agree about anything. The hope for networked knoweldge is that we’re learning to disagree more fruitfully, in a linked environment

5. And, as the Internet makes very clear, we are fallible creatures. We get everything wrong. So, networked knowledge becomes more credible when it acknowledges fallibility. This is very different from the old paper based authorities who saw fallibility as a challenge to their authority.

6. Finally, knowledge is taking on the humor of the Internet. We’re on the Internet voluntarily and freed of the constrictions of paper, it turns out that we like being with one another. Even when the topic is serious like this topic at Reddit [a discussion of a physics headline], within a few comments, we’re making jokes. And then going back to the serious topic. Paper squeezed the humor out of knowledge. But that’s unnatural.

These properties of networked knowledge are also properties of the Network. But they’re also properties that are more human and more natural than the properties of traditional knowledge.

But there’s one problem:

There is no such thing as natural knowledge. Knowledge is a construct. Our medium may have changed, but we haven’t, at least so it seems. And so we’re not free to reinvent knowledge any way we’d like. Significant problems based on human tendencies are emerging. I’ll point to four quick problem areas.

First, We see the old patterns of concentration of power reemerge on the Net. Some sites have an enormous number of viewers, but the vast majority of sites have very few. [Slide shows Clay Shirky’s Power Law distribution chart, and a photo of Clay]

Albert-László Barabási has shown that this type of clustering is typical of networks even in nature, and it is certainly true of the Internet

Second, on the Internet, without paper to anchor it, knowledge often loses its context. A tweet…

Slips free into the wild…

It gets retweeted and perhaps loses its author

And then gets retweeted and lose its meaning. And now it circulates as fact. [My example was a tweet about the government not allowing us to sell body parts morphing into a tweet about the government selling body parts. I made it up.]

Third, the Internet provides an incentive to overstate.

Fourth, even though the Net contains lots of different sorts of people and ideas and thus should be making us more open in our beliefs…

… we tend to hang out with people who are like us. It’s a natural human thing to prefer people “like us,” or “people we’re comfortable with.” And this leads to confirmation bias — our existing beliefs get reinforced — and possibly to polarization, in which our beliefs become more extreme.

This is known as the echo chamber problem, and it’s a real problem. I personally think it’s been overstated, but it is definitely there.

So there are four problems with networked knowledge. Not one of them is new. Each has a analog from before the Net.

  1. The loss of context has always been with us. Most of what we believe we believe because we believe it, not because of evidence. At its best we call it, in English, common sense. But history has shown us that common sense can include absurdities and lead to great injustices.

  2. Yes, the Net is not a flat, totally equal place. But it is far less centralized than the old media were, where only a handful of people were allowed to broadcast their ideas and to choose which ideas were broadcast.

  3. Certainly the Internet tends towards overstatement. But we have had mass media that have been built on running over-stated headlines. This newspaper [Weekly World News] is a humor paper, but it’s hard to distinguish from serious broadcast news.

  4. And speaking of Fox, yes, on the Internet we can simply stick with ideas that we already agree with, and get more confirmed in our beliefs. But that too is nothing new. The old media actually were able to put us into even more tightly controlled echo chambers. We are more likely to run into opposing ideas — and even just to recognize that there are opposing ideas — on the Net than in a rightwing or leftwing newspaper.

It’s not simply that all the old problems with knowledge have reemerged. Rather, they’ve re-emerged in an environment that offers new and sometimes quite substantial ways around them.

  1. For example, if something loses its context, we can search for that context. And links often add context.

  2. And, yes, the Net forms hubs, but as Clay Shirky and Chris Anderson have pointed out, the Net also lets a long tail form, so that voices that in the past simply could not have been heard, now can be. And the activity in that long tail surpasses the attention paid to the head of the tail.

  3. Yes, we often tend to overstate things on the Net, but we also have a set of quite powerful tools for pushing back. We review our reviews. We have sites like the well-regarded American site, Snopes.com, that will tell you if some Internet rumor is true. Snopes is highly reliable. Then we have all of the ways we talk with one another on the Net, evaluating the truth of what we’ve read there.

  4. And, the echo chamber is a real danger, but we also have on the Net the occasional fulfillment of our old ideal of being able to have honest, respectful conversations with people with whom we fundamentally disagree. These examples are from Reddit, but there are others.

So, yes, there are problems of knowledge that persist even when our technology of knowledge changes. That’s because these are not technical problems so much as human problems…

…and thus require human solutions. And the fundamental solution is that we need to become more self-aware about knowledge.

Our old technology — paper — gave us an idea of knowledge that said that knowledge comes from experts who are filtered, printed, and then it’s settled, because that’s how books work. Our new technology shows us we are complicit in knowing. In order to let knowledge get as big as our new medium allows, we have to recognize that knowledge comes from all of us (including experts), it is to be linked, shared, discussed, argued about, made fun of, and is never finished and done. It is thoroughly ours – something we build together, not a product manufactured by unknown experts and delivered to us as if it were more than merely human.

The required human solution therefore is to accept our human responsibility for knowledge, to embrace and improve the technology that gives knowledge to us –- for example, by embracing Open Access and the culture of linking and of the Net, and to be explicit about these values.

Becoming explicit is vital because our old medium of knowledge did its best to hide the human qualities of knowledge. Our new medium makes that responsibility inescapable. With the crumbling of the paper authorities, it bcomes more urgent than ever that we assume personal and social responsibility for what we know.

Knowing is an unnatural act. If we can remember that –- remember the human role in knowing — we now have the tools and connections that will enable even everyday knowledge to scale to a dimension envisioned in the past only by the mad and the God-inspired.

Thank you.

Categories: conferences, culture, italy, liveblog, milan, too big to know

Tags:

2 Comments

[2b2k] Can business intelligence get intelligent enough?

Greg Silverman [twitter:concentricabm], the CEO of Concentric, has a good post at CMS Wire about the democratization of market analysis. He makes what seems to me to be a true and important point: market researchers now have the tools to enable them to slice, dice, deconstruct, and otherly-construct data without having to rely upon centralized (and expensive) analytics firms. This, says Greg, changes not only the economics of research, but also the nature of the results:

The marketers’ relationships with their analytics providers are currently strained as a service-based, methodologically undisclosed and one-off delivery of insights. These providers and methods are pitted against a new generation of managers and executives who are “data natives” —professionals who rose to the top by having full control of their answering techniques, who like to be empowered and in charge of their own destinies, and who understand the world as a continuous, adaptive place that may have constantly changing answers. This new generation of leaders likes to identify tradeoffs and understand the “grayness” of insight rather than the clarity being marketed by the service providers.

He goes on to make an important point about the perils of optimization, which is what attracted the attention of Eric Bonabeau [twitter:bonabeau], whose tweet pointed me at the post.

The article’s first point, though, is interesting from the point of view of the networking of knowledge, because it’s not an example of the networking of knowledge. This new generation of market researchers are not relying on experts from the Central Authority, they are not looking for simple answers, and they’re comfortable with ambiguity, all of which are characteristics of networked knowledge. But, at least according to Greg’s post, they are not engaging with one another across company boundaries, sharing data, models, and insights. I’m going to guess that Greg would agree that there’s more of that going on than before. But not enough.

If the competitive interests of businesses are going to keep their researchers from sharing ideas and information in vigorous conversations with their peers and others, then businesses simply won’t be as smart as they could be. Openness optimizes knowledge system-wide, but by definition it doesn’t concentrate knowledge in the hands of a few. And this may form an inherent limit on how smart businesses can become.

Categories: business, commons, marketing

Tags:

No Comments

[2b2k] Is big data degrading the integrity of science?

Amanda Alvarez has a provocative post at GigaOm:

There’s an epidemic going on in science: experiments that no one can reproduce, studies that have to be retracted, and the emergence of a lurking data reliability iceberg. The hunger for ever more novel and high-impact results that could lead to that coveted paper in a top-tier journal like Nature or Science is not dissimilar to the clickbait headlines and obsession with pageviews we see in modern journalism.

The article’s title points especially to “dodgy data,” and the item in this list that’s by far the most interesting to me is the “data reliability iceberg,” and its tie to the rise of Big Data. Amanda writes:

…unlike in science…, in big data accuracy is not as much of an issue. As my colleague Derrick Harris points out, for big data scientists the abilty to churn through huge amounts of data very quickly is actually more important than complete accuracy. One reason for this is that they’re not dealing with, say, life-saving drug treatments, but with things like targeted advertising, where you don’t have to be 100 percent accurate. Big data scientists would rather be pointed in the right general direction faster — and course-correct as they go – than have to wait to be pointed in the exact right direction. This kind of error-tolerance has insidiously crept into science, too.

But, the rest of the article contains no evidence that the last sentence’s claim is true because of the rise of Big Data. In fact, even if we accept that science is facing a crisis of reliability, the article doesn’t pin this on an “iceberg” of bad data. Rather, it seems to be a melange of bad data, faulty software, unreliable equipment, poor methodology, undue haste, and o’erweening ambition.

The last part of the article draws some of the heat out of the initial paragraphs. For example: “Some see the phenomenon not as an epidemic but as a rash, a sign that the research ecosystem is getting healthier and more transparent.” It makes the headline and the first part seem a bit overstated — not unusual for a blog post (not that I would ever do such a thing!) but at best ironic given this post’s topic.

I remain interested in Amanda’s hypothesis. Is science getting sloppier with data?

Categories: big data, science, too big to know

Tags:

No Comments

[errata] Inconsistent reference

The next day: Double d’oh! Mcconstock points out that Pauling didn’t have a stinking vaccine. So, it’s wrong, not just inconsistent. Thanks again, mccomstock!


mccomstock on Twitter points out that when I do a call back to Salk, I instead reference Pauling:

“It has been rare and hard-won such as Darwin with his barnacles and Linus Pauling with his vaccine” (page 176)

Too late to fix it in the paperback. At least it’s not factually wrong, just infelicitous. Thanks, mccomstock!

Categories: errata

Tags: , ,

No Comments

[2b2k] What we can learn from what we don’t know

I wrote a piece in the early afternoon yesterday about what we can learn from watching how we fill in the blanks when we don’t know stuff…in this case, when we don’t know much about Suspect #1 and #2. It’s about the narratives that shape our unserstanding.

For example, it turns out that I only have three Mass Murderer Narratives: Terrorist, Anti-Social, or Delusional. As we learned more about Suspect #2 yesterday, he seemed not to fit well into any of them. Perhaps he will once we know more, or perhaps my brain will cram him into one even if he doesn’t fit. Anyway, you can read the post at CNN.

 


I find myself unwilling to use Suspect #2′s name today because Martin Richard is too much with me.

Categories: boston, cnn, journalism, marathon, narratives, stories

Tags:

No Comments

[misc][2b2k] Making Twitter better for disasters

I had both CNN and Twitter on yesterday all afternoon, looking for news about the Boston Marathon bombings. I have not done a rigorous analysis (nor will I, nor have I ever), but it felt to me that Twitter put forward more and more varied claims about the situation, and reacted faster to misstatements. CNN plodded along, but didn’t feel more reliable overall. This seems predictable given the unfiltered (or post-filtered) nature of Twitter.

But Twitter also ran into some scaling problems for me yesterday. I follow about 500 people on Twitter, which gives my stream a pace and variety that I find helpful on a normal day. But yesterday afternoon, the stream roared by, and approached filter failure. A couple of changes would help:

First, let us sort by most retweeted. When I’m in my “home stream,” let me choose a frequency of tweets so that the scrolling doesn’t become unwatchable; use the frequency to determine the threshold for the number of retweets required. (Alternatively: simply highlight highly re-tweeted tweets.)

Second, let us mute based on hashtag or by user. Some Twitter cascades I just don’t care about. For example, I don’t want to hear play-by-plays of the World Series, and I know that many of the people who follow me get seriously annoyed when I suddenly am tweeting twice a minute during a presidential debate. So let us temporarily suppress tweet streams we don’t care about.

It is a lesson of the Web that as services scale up, they need to provide more and more ways of filtering. Twitter had “follow” as an initial filter, and users then came up with hashtags as a second filter. It’s time for a new round as Twitter becomes an essential part of our news ecosystem.

Categories: boston marathon, everythingismisc, everythingIsMiscellaneous, journalism, twitter

Tags:

No Comments

Elsevier acquires Mendeley + all the data about what you read, share, and highlight

I liked the Mendeley guys. Their product is terrific — read your scientific articles, annotate them, be guided by the reading behaviors of millions of other people. I’d met with them several times over the years about whether our LibraryCloud project (still very active but undergoing revisions) could get access to the incredibly rich metadata Mendeley gathers. I also appreciated Mendeley’s internal conflict about the urge to openness and the need to run a business. They were making reasonable decisions, I thought. At they very least they felt bad about the tension :)

Thus I was deeply disappointed by their acquisition by Elsevier. We could have a fun contest to come up with the company we would least trust with detailed data about what we’re reading and what we’re attending to in what we’re reading, and maybe Elsevier wouldn’t win. But Elsevier would be up there. The idea of my reading behaviors adding economic value to a company making huge profits by locking scholarship behind increasingly expensive paywalls is, in a word, repugnant.

In tweets back and forth with Mendeley’s William Gunn [twitter: mrgunn], he assures us that Mendeley won’t become “evil” so long as he is there. I do not doubt Bill’s intentions. But there is no more perilous position than standing between Elsevier and profits.

I seriously have no interest in judging the Mendeley folks. I still like them, and who am I to judge? If someone offered me $45M (the minimum estimate that I’ve seen) for a company I built from nothing, and especially if the acquiring company assured me that it would preserve the values of that company, I might well take the money. My judgment is actually on myself. My faith in the ability of well-intentioned private companies to withstand the brute force of money has been shaken. After all this time, I was foolish to have believed otherwise.

MrGunn tweets: “We don’t expect you to be joyous, just to give us a chance to show you what we can do.” Fair enough. I would be thrilled to be wrong. Unfortunately, the real question is not what Mendeley will do, but what Elsevier will do. And in that I have much less faith.

 


I’ve been getting the Twitter handles of Mendeley and Elsevier wrong. Ack. The right ones: @Mendeley_com and @ElsevierScience. Sorry!

Categories: annotations, copyright, culture, elsevier, mendeley, open access, too big to know

Tags:

No Comments

[2b2k] Back when not every question had an answer

Let me remind you young whippersnappers what looking for knowledge was like before the Internet (or “hiphop” as I believe you call it).

Cast your mind back to 1982, when your Mommy and Daddy weren’t even gleams in each other’s eyes. I had just bought my first computer, a KayPro II.

I began using WordStar and ran into an issue pretty quickly. For my academic writing, I needed to create end notes. Since the numbering of those notes would change as I took advantage of WordStar’s ability to let me move blocks of text around (^KB and ^KK, I believe, marked the block), I’d have to go back and re-do the numbering both in the text and in the end notes section. What a bother!

I wanted to learn how to program anyway, so I sat down with the included S-Basic manual. S-Basic shared syntax with BASIC, but it assumed you’d write functions, not just lines of code to be executed in numbered order. This made it tougher to learn, but that’s not what stopped me at first. The real problem I had was figuring out how to open a file so that I could read it. (My program was going to look for anything between a “[[" and a "]]”,, which would designate an in-place end note.)The manual assumed I knew more than I did, what with its file handlers and strange parameters for what type of file I was reading and what types of blocks of data I wanted to read.

I spent hours and hours and hours, mainly trying random permutations. I was so lacking the fundamental concepts that I couldn’t even figure out what to play with. I was well and truly stuck.

“Simple!” you say. “Just go on the Internet…and…oh.” So, it’s 1982 and you have a programming question. Where do you go? The public library? It was awfully short on programming manuals at that time, and S-Basic was an oddball language. To your local bookstore? Nope, no one was publishing about S-Basic. Then, how about to…or…well…no…then?…nope, not for another 30 years.

I was so desperate that I actually called the Boston University switchboard, and got connected to a helpful receptionist in the computer science division (or whatever it was called back then), who suggested a professor who might be able to help me. I left a message along the lines of “I’m a random stranger with a basic question about a programming language you probably never heard of, so would you mind calling me back? kthxbye.” Can you guess who never called me back?

Eventually I did figure it out, if by “figuring out” you mean “guessed.” And by odd coincidence, as I contemplate moving to doing virtually all my writing in a text editor, I’m going to be re-writing that little endnoter pretty soon now.

But that’s not my point. My point is that YOU HAVE NO IDEA HOW LUCKY YOU ARE, YOU LITTLE BASTARDS.

 


For those of you who don’t know what it’s like to get a programming question answered in 2013, here are some pretty much random examples:

Categories: kaypro, old fart, programming, tech

Tags:

1 Comment