Adding Structure to Blog Posts
As you probably know (since you’re reading this!), blogs are [usually open access] websites which contain periodic time-stamped posts (in reverse chronological order) about a particular genre or touching on a number of topics of interest. They range from individual’s online diaries or journals to promotional tools used by companies or political campaigns, and many allow public commenting on their posts. They are also starting to cross the generation gap – your kids might have a blog on Bebo, you may blog yourself and your parents could be reading or commenting on your posts.
The growth and takeup of blogs over the past four years has been dramatic, with a doubling in the size of the blogosphere every six or so months (according to statistics from Technorati). Over 100,000 blogs are created every day, working out at about one a second. Nearly 1.5 million blog posts are being made each day, with over half of bloggers contributing to their sites three months after the blog’s creation.
Similar to accidentally wandering onto message boards and web-enabled mailing lists, when you’re searching for something on the Web, you may often happen across a relevant entry on someone’s blog. RSS feeds are also a useful way of accessing information from your favourite blogs, but they are usually limited to the last 15 entries, and don’t provide much information on exactly who wrote or commented on a particular post, or what the post is talking about. Some approaches like SIOC aim to enhance the semantic metadata provided about blogs, forums and posts, but there is also a need for more information about what exactly a person is writing about. If you’re searching for particular information in or across blogs, it’s often not that easy to get it because of “splogs” (spam blogs) and the fact that the virtue of blogs so far has been their simplicity – apart from the subject field, everything and anything is stored in one big text field for content. Keyword searches may give some relevant results, but useful questions such as “find me all the restaurants that bloggers reviewed in Dublin with a rating of at least 5 out of 10″ cannot be posed, and you cannot easily drag-and-drop events or people or anything (apart from URLs) mentioned in blog posts into your own applications.
I’m going to talk about two approaches to tackle this issue of adding more information to posts, so that queries can be made and the things that people talk about can be reused in other posts or applications, because not everyone is being served well by the lowest common denominator that we currently have in blogs. The first is called structured blogging and the second semantic blogging. (I’ll cover semantic blogging in my next installment…)
“Structured blogging” is an open source community effort that has created tools to provide microcontent (including microformats like hReview) from popular blogging platforms such as WordPress and Moveable Type. In structured blogging, packages of structured data are becoming post components. Sometimes (not all of the time) you will have a need for more structure in your posts – if you know a subject deeply, or if your observations or analyses recur in a similar manner throughout your blog – then you may best be served by filling in a form (which has its own metadata and model) during the post creation process. For example, you may be writing a review of a film you went to see, or a report on a sports game you attended, or a guide to tourist attractions you saw on your travels. Not only do people get to express themselves more clearly, but blogs can start to interoperate with enterprise applications through the microcontent that is being created in the background.
Let’s say that someone (or a group of people) is reviewing some soccer games that they watched. Their after-game soccer reports will typically include information on which teams played, where the game was held and when, who were the officials, what were the significant game events (who scored, when and how, or who received penalties and why, etc.) – it’d be great if these blog posters could use a tool that would understand this structure, presenting an editing form with the relevant fields and creating both HTML and RSS with this stucture embedded in it. Then other people reading these posts could say, “hey, I want to reuse this structure in my own posts” and their blog reader / creator could make this structure available when the blogger is ready to write. As well as this, reader applications could begin to answer questions based on the form fields available – “show me all the matches from Germany with more than two goals scored”, etc.
At the moment, the structured blogging tools do provide a fixed set of forms that bloggers can fill in (see the WordPress restaurant review form on the right) – for things like reviews, events, audio, video and people – but there is no reason that people couldn’t create custom structures, and news aggregators or readers could auto-discover an unknown structure, notify a user that a new structure is available, and learn the structure for reuse in the user’s future posts.
There have been some other past efforts with similar aims to the structured blogging community, including Qlogger, the Lafayette project, and JemBlog. And in the future, Semantic Web technologies could be used to ontologise any available post structures for more linkage and reuse… This neatly brings me on to semantic blogging, which I’ll discuss in the next post!