html5 article

on syndication and content types

Ever since the article tag was introduced in html5 I've been struggling with a way to wrap my head around its practical implications. I've written about the difference between section and article before, but this time I'll keep a more rigid focus on the article element itself and how to approach it when writing html code. As the current definition leaves too much room for doubt and misinterpretation, we need something more tangible to guide us along.

what changed since last time

A lot has been written about the article element already, but many of these posts are based on an older definition of the html5 article. The definition received a small (yet important) update not too along ago, making it at least a little more relevant for everyday use. Here's the definition I quoted last time (and which can be found in most articles google turns up when searching for html5 article info):

The article element represents a component of a page that consists of a self-contained composition in a document, page, application, or site and that is intended to be independently distributable or reusable, e.g. in syndication.

html5 doctor

If you check the w3c or whatwg site now though, you'll come across the current version:

The article element represents a self-contained composition in a document, page, application, or site and that is, in principle, independently distributable or reusable, e.g. in syndication.

w3c

The big difference? Well, the article element went from something that is intended for syndication to in principle, syndicatable. It may sound like a minor change in phrasing, but it means that we can now use the article element for content that is not actually being syndicated, but could be (in principle). With the first definition, you'd be (at least, in theory) forced to change the html code of your site if it was suddenly decided a particular piece of code was not syndicatable anymore (as in, we'll stop the rss feed of our event data). Rather that describe content that is syndicated, the article element is now used for content that could be syndicated.

on syndication

I am somewhat bothered though by the introduction of a term like "syndication" in the html spec. Syndication is a description that is neither semantic nor structural, so it feels pretty much out of place in there. Furthermore, as a content owner and believer in the semantic web, I don't really care what part of my content is syndicatable. As long as I am properly quoted, crawlers can pass by and scrape whatever piece of content they want. They want to include my main navigation in their site? Sure, why not. Want to get my rss links? Just take 'em. Building a front-end newsletter form aggregation site? Go ahead, crawl my site and take whatever you see fit.

On top of that, I'm not planning to ask myself whether there is a possible scenario for syndication each and every time I write a div or section. I'd be going insane as most of the time I could probably come up with some obscure reason why someone would still want to syndicate that particular part of his site. Also note that the definition of syndication is broader than "it can appear in an rss feed". It's practically every situation where you as a site owner would like to offer a piece of your code to an external source.

One final (and important) remark about syndication is that it is just cited as an example in the current definition. The e.g. list is not exhaustive, meaning syndication is just one example of many. When talking about the article element people often refer to syndication (and rss feeds) though, but the real focus lies on "independently distributable or reusable".

on self-contained

Note that the current definition also holds two separate requirements for using the article element. Independently distributable or reusable is just one part of the definition, your piece of code should also be self-containing. Again your mileage may vary and people will hold different meanings to this part of the definition.

The most popular example to illustrate this vagueness is the wrapping of a blog comment in a separate article tag. While it is not uncommon to offer comments through an rss feed, the question remains whether a comment is really self-contained. Nobody doubts that a comment can exist by itself and holds all the data to properly define itself, the real question is whether it has any dependable value outside of its immediate context. Distributing a comment without any of the other comments doesn't always make sense, especially when the commenter didn't bother to quote previous comments to which he is reacting. It becomes just another random blob of text, not that much different from a simple paragraph.

Again it's a tricky issue which can probably fuel days of discussion, only leading me to believe that it's probably not all that fit for defining the proper use of the article element.

abstraction: on content types

Taking one step back, let's see if we can figure out why people felt the need to create an article element in the first place. This is just second-guessing of course, but it might help us to get a little closer to the core purpose of this new tag. Syndication probably wasn't the incentive, I think meta data/concepts like that would probably be better fit as an attribute rather than a separate tag. And for sectioning pages the spec already lists the section tag.

Straying away from edge cases and fuzzy definitions for a minute, we'll find some proper and indisputable use for the article element when marking up data like news, events, products, reviews, contacts ... and yeah, even comments. People with a little understanding of CMSes like Drupal will recognize these elements as content types. Content types are a way of describing and entering structured data, displaying that data in different views (shortlist - overview list with filters - detail) all across a site.

If you keep to this perspective, all of a sudden the use cases for the article element become a lot clearer. Content type instances are typically self-contained, make sense as syndicated content and are definitely reusable (on your own site as well as on someone else's). The only difference with the current definition is that you probably lose some obscure edge cases (like a newsletter subscription box) in the process.

conclusion

So even though using the article element for content types might exclude a couple of valid use cases when held against the standing definition, the clarity it brings makes it a lot easier to help you decide whether to use the article tag or just stick with a section/div.

That said, it somehow feels as if the content type approach lies closer to the original intentions of the article tag, which were then somehow clouded by a definition that's way too fuzzy. It could be my personal interpretation of course, but for now I'll stick to using the article element for content type instances exclusively, and suggest you do the same until the next article spec update.