html5 data- attribute

For most of you the new html5 data- attribute should be quite familiar by now. It was one of the first additions of html5 to become available right away. No problems with backwards compatibility, no browsers/browser versions being a pain in the ass, just clear and easy to use. But it's just not a very sexy addition, which is why you'll only read about the property's best practices in more specialized places. A real shame, as it is an important step in further cleaning up our html code.

what and why?

Custom data attributes are intended to store custom data private to the page or application, for which there are no more appropriate attributes or elements.
These attributes are not intended for use by software that is independent of the site that uses the attributes.

That's what the spec has to say about the data- attribute. Basically it's an attribute (a label/value pair) that holds data which doesn't belong in the html as a text node. This data is not intended for users, it's there to aid scripts in their tasks. Following the "data-" prefix is a string that can be freely chosen (within the boundaries of regular xml constraints), allowing multiple data- attributes with different suffixes to be placed on one single html element.

Before the data- attribute was introduced to the spec we had to be creative when script-only data was needed in our html code. We abused hidden input fields, we hid html elements from view with css, we even used title attributes to stuff our script data and the really adventurous amongst us just made up their own attributes, extending the dtd if validation was a requirement. None of these methods were perfect, some were just plain hacks or failed to work in all circumstances. An standard alternative was needed and so the data- attribute was born.

Use cases

The above explanation has been quite theoretical so far, so let's find some real-life use cases for our attribute. These past months I've been able to distinguish three main use cases where the data- attribute proved to be an invaluable tool. As an example, let's take a run of the mill web shop and see where our data- attribute comes in handy:

1. data for computations in scripts

< .. data-quantitystep="100" .. >

Adding products to your basket can be somewhat tricky depending on what you're trying to sell. If you are an online media shop things should be quite straightforward, but if you sell products that are sold in different unit quantifiers (like a 6-pack of coke vs 500gr of cheese) than there are more things to consider. If you use a custom +/- control to change your quantities, it will behave differently for different products. For a 6-pack of coke you could simply up the quantity by 1, but for the cheese you might opt to up the quantity in steps of 100gr at a time.

This value differs depending on the product people are trying to add to their basket, so instead of sticking the quantity step data in hidden input elements or making a lengthy passage through the back-end every time you add a product, we can now add this value as a data- attribute and have our scripts use that.

2. data for changed state

< .. data-replace="close extra information" .. >

Sometimes you encounter data on a page that changes depending on the state of a certain component. A typical expand/collapse component will often feature a control handler with a textual open/close indication. This open/close text is dependent on the state of the expand/collapse component, so putting them both in the html as text nodes is not really the way to go. If you would disable your css, both text nodes would show up, which is confusing the say the least. Another option is to add the changed state text in javascript, but if your site is multilingual this makes quite a mess of your javascript file. And all things considered, javascript is just no place for managing your content.

Using the data- attribute though, we can have the changed state string in our html without it ever showing up. The script can extract the data from the attribute when needed and can substitute the original value back into the data- attribute (anticipating the next state-change of the component).

3. help your back-end developer

< .. data-productid="025652156" .. >

And ultimately the data- attribute can even be used to help out your back-end developers a little. Sometimes it's easier to just add extra meta data to your html code, facilitating ajax calls and other back-end operations. If you can include the database product identifier for each product in the html, ajax calls handling a product could be made quicker and easier as the product identifier can be passed on, instead of going through a few extra loops and queries on the back-end side trying to find out which product is being added.

In the past this was usually done through hidden inputs, now we can just use the data- attributes. Mind though that in some (most?) situations a hidden input element might still be preferred, especially if you are planning for form-submit fallback (when the user has no javascript). The hidden input is then submitted while the data- attribute is lost to the back-end. Definitely something to keep in mind.

internal vs external

If you check the spec again, you will also notice that the data- attribute is only meant for data that is to be used by internal scripts, meaning scripts you have specifically developed to work on the website you're building. A strange limitation that begged for further explanation, but none was given in the spec itself. My first guess was that they included this to prevent abuse of the data- attribute (seo keyword stuffing that would be picked up by search engines) but that felt like a big price to pay for something that cannot be stopped either way.

So I went to the whatwg (irc ftw! - check the logs) and asked around for more concrete information. It turns out that it's not really an issue of abuse, but of possible conflict. Since there is no governing entity and everyone can freely decide the name of the attribute, collisions might occur. Google might be using data-contentid for one thing, while Amazon might be using the same attribute for something else.

Fair enough, but that doesn't solve our problem when we need to provide extra data for external scripts. I pushed the whatwg for alternatives and even though their first options were all less than satisfactory (using microformats - meaning you still need to add your data as text nodes - or using the itemid property - meaning you're limited to only one property), there is one way to work around the scope limitations of the data- attribute:

<div itemtype="..."> <meta itemprop="productid" content="025652156" /> </div>

Apparently there's an edge case where meta elements can be used outside of the head of a html document. When combined with an itemprop attribute they serve the same purpose as data- attributes, but for data targeted at external scripts. This was the first I ever heard of it, but all in all it's a decent solution that fits the whole microdata implementation worked out in html5.

The only problem of course is that this is way more complex than introducing some or other data- attribute to hold your data, as you need external documentation for your microdata semantics and structure. Figuring this out as a front-end developer is hard enough, getting your back-end developers on board is a completely different challenge. I fear that the cost of microdata is just too high too make this a very workable solution, especially when nothing is holding you back to just use a data- attribute. And if you choose your attribute name wisely (what's the chance of data-amazonproductid ever appearing on some site not intended for amazon?) there shouldn't be much of a problem.

conclusion

Apart from the internal vs external discussion, the data- attribute is ready for use (and has been for quite a while now) and proves a very handy way to conceal script data from agents and users alike. It's an valuable addition to help you clean up your html, remove unnecessary text nodes and hidden inputs where they aren't needed, even helping out in cleaning up your javascript files to make them more robust and less data dependent. Just remember that you might still need some hidden input elements, especially if form-submit fallback is required.

As for the microdata alternative (external scripts), I'm still not too sure. I'd be inclined to ignore it for the time being, hoping that we'll once again face a "pave the cow path" situation in future releases of the html spec.