Appreciating the wisdom of Clay Shirky’s comments regarding the early Web and how it would end

(written by lawrence krubner, however indented passages are often quotes). You can contact lawrence at:, or follow me on Twitter.

When the Web first emerged in the mid 1990s, people were astonished at the reality that their existed a “long tail” full of more revenue potential than the “fat body”. Amazon proved this early on. At the time, a large Barnes and Noble might sell 130,000 unique items, but Amazon was getting the majority of its revenue from the millions of items that were not among its 130,000 best selling items.

When we speak of what made the early Web unique and exciting, a lot of what we are talking about was the discovery of this long tail. The millions of small topics that small groups of people were passionate about, everything from Pokemon, to slash fic, to scholarly discussions of why Napoleon won the battle of Austerlitz.

Back in 2003, Clay Shirky wrote “Power Laws, Weblogs, and Inequality” which suggested one way the quality of the early Web would eventually decline:

However, though the inequality is mostly fair now, the system is still young. Once a power law distribution exists, it can take on a certain amount of homeostasis, the tendency of a system to retain its form even against external pressures. Is the weblog world such a system? Are there people who are as talented or deserving as the current stars, but who are not getting anything like the traffic? Doubtless. Will this problem get worse in the future? Yes.

Though there are more new bloggers and more new readers every day, most of the new readers are adding to the traffic of the top few blogs, while most new blogs are getting below average traffic, a gap that will grow as the weblog world does. It’s not impossible to launch a good new blog and become widely read, but it’s harder than it was last year, and it will be harder still next year. At some point (probably one we’ve already passed), weblog technology will be seen as a platform for so many forms of publishing, filtering, aggregation, and syndication that blogging will stop referring to any particularly coherent activity. The term ‘blog’ will fall into the middle distance, as ‘home page’ and ‘portal’ have, words that used to mean some concrete thing, but which were stretched by use past the point of meaning. This will happen when head and tail of the power law distribution become so different that we can’t think of J. Random Blogger and Glenn Reynolds of Instapundit as doing the same thing.

At that time, the Web was protected against corrupting influences because there was a lack of advertising. In 2000 advertising on the Web meant banner ads, which had the reputation of being over-priced and ineffective. Especially when the crash hit in 2001, many companies swore they would never again waste money on advertising on the Web. But Overture came up with the idea of buying keywords for specific searches, and then Overture was absorbed into Google and became the Google business model, and by 2007 or 2008 many businesses were willing to get serious about advertising on the Web. At which point advertising, as a business model, took off, and it helped drive the consolidation of the Web, and it helped give us the homogeneous Web of today.

For my tastes, the original blogosphere of 2000-2008 was one of the finest things the human race ever created, and the intellectual stimulation of the debates has no equal in any other age. At some point between 2007 and 2010 the Web began to consolidate around a few sites, especially YouTube and Twitter and Facebook. And so the early blogosphere died.

I should add, the worst aspects of the consolidation were driven, specifically, by the adoption of algorithms that are designed to maximize engagement, because the goal is to serve as many ads as possible. The algorithms have a deadening effect. They tend to promote a fairly small cluster of content, making invisible perhaps 99% of the content that is out there. Why? One of Clay Shirky’s points is that traffic to Web sites follow a power law. This means advertising revenue follows a power law. Indeed, Google now gets about 45% of all the revenue spent on Web advertising. Every company is under pressure to push itself as high along that power curve as possible. The smoother the curve, the more desperate the struggle, there is no safe spot where a company can tie up for awhile and celebrate some aspect of its work, other than mere page views. Shirky wrote:

Finally, there is no real A-list, because there is no discontinuity. Though explanations of power laws (including the ones here) often focus on numbers like “12% of blogs account for 50% of the links”, these are arbitrary markers. The largest step function in a power law is between the #1 and #2 positions, by definition. There is no A-list that is qualitatively different from their nearest neighbors, so any line separating more and less trafficked blogs is arbitrary.

But this turned out to be incorrect:

Third, the stars exist not because of some cliquish preference for one another, but because of the preference of hundreds of others pointing to them. Their popularity is a result of the kind of distributed approval it would be hard to fake.

The algorithms are designed to fake a kind of preference, or to distort those preferences, because of the desperation of the company to move higher on the power law curve.

The point is, Shirky pointed to a natural process, and advertising money provided a powerful incentive to hack that natural process and try to distort it for the benefit of the company.

A good engineer can write utility functions or novelty functions. The novelty functions can use a variety of search strategies to turn up new material that might be of interest to a viewer, whereas if a company simply wants to maximize engagement, then it should prefer utility functions that surfaces material know to trigger engagement. Hopefully my point is clear: the arrival of advertising on the Web gave companies an incentive to favor utility functions over novelty functions. Whereas Geocities, circa 1996, encouraged horizontal exploration, YouTube nowadays favors an algorithm that steers people reliably to the most extreme or outrageous or inflammatory material on any subject.

I personally often check the articles on Hacker News. It has no advertising. Perhaps that is why it functions as a good discovery service? But even there, it was more useful a few years ago, when there was less money associated with the site.

The long tail of the Web remains one of the world’s great resources, but there is currently no business model for encouraging it’s exploration. We can imagine such business models, perhaps variations on Mechanical Turk, with people working for content instead of money. But no one’s built such a company yet.

A final point, Clay Shirky seems to have underestimated how unstable the middle zone is, when a writer writes an essay for a moderate sized audience, and can participate in any follow-on conversations. That’s gone. A post goes viral, or it doesn’t, if it does then the resulting audience is too large to talk to, and if it doesn’t then there is often no one to talk to.

Meanwhile, the long tail of weblogs with few readers will become conversational. In a world where most bloggers get below average traffic, audience size can’t be the only metric for success. LiveJournal had this figured out years ago, by assuming that people would be writing for their friends, rather than some impersonal audience. Publishing an essay and having 3 random people read it is a recipe for disappointment, but publishing an account of your Saturday night and having your 3 closest friends read it feels like a conversation, especially if they follow up with their own accounts. LiveJournal has an edge on most other blogging platforms because it can keep far better track of friend and group relationships, but the rise of general blog tools like Trackback may enable this conversational mode for most blogs.

In between blogs-as-mainstream-media and blogs-as-dinner-conversation will be Blogging Classic, blogs published by one or a few people, for a moderately-sized audience, with whom the authors have a relatively engaged relationship. Because of the continuing growth of the weblog world, more blogs in the future will follow this pattern than today. However, these blogs will be in the minority for both traffic (dwarfed by the mainstream media blogs) and overall number of blogs (outnumbered by the conversational blogs.)