abstract content categories

deconstructing a web page

One of the main challenges of a html guy is coming up with proper ways to name different components. How you plan to do this is beyond the scope of this article (using your own class names, microdata or microformats are all valid options), but the actual act of naming them is something that deserves some extra attention. For me it's a realization that grew over time and one I had to figure out on my own, as little is written about this topic. So here goes.

html brings light into the darkness

html is about structure and semantics. In all these years I've been blogging I found myself repeating this over and over again, but that's just because it's one of the most purest (and actually one of the few) truths in our profession. It's the baseline and starting point of any proper argument you can have on different ways to write html.

What this means is that html should provide as much clarity on the content it describes as well as reduce the chance of any ambiguous assumptions. Both structural and semantic information are important so that humans (less important) and machines (very important) can try to analyze your content and use it for their own specific purposes. This goes from search result optimizations to screen reader software offering your content in well-structured, bite-sized portions.

top-down semantics

When I first started to learn about html (and its semantic value) I was clinging to a very narrow view of what semantics was all about. Trying to find a proper and descriptive name for a component happened with little regard to any relations it had to other existing components. It was really an exercise in "what is the best name to describe this thing" without wondering about "... and how does it fit in with the rest of my page/site".

As you write more and more html code you find yourself making connections between certain components. When I just started out there was a moment when I realized it might be good to somehow group all navigation components together. I started using a prefix (nav) for classes referencing navigation components. Nowadays we have a unique tag for this specific case. The benefit of doing this: screen readers now have an easier job finding site navigation and offering it in modified form to their users.

While experience will teach you these things over time, I feel that bottom-up grouping is not the best way to start off. So let's see what a page looks like from the top, working our way down to the level of individual components.

3 abstract content groups

I found that just about any page out there can be split up in a combination of three different abstract content groups. Note that none of these groups are actually required to build a proper web page. A quick rundown:

branding

The smallest group of the abstracts. These elements have no actual value to the user beyond making them feel at ease as they recognize your brand and trust you to offer them the information you need. Most branding is done through css styling, but logos and taglines are clearly html elements with the sole purpose of branding a webpage. Branding is one of the key priorities of the author, but users really don't care all that much.

page content

Page content is what brings you to a site. It's the informative data or needed actions you hope to find when surfing the web. Not all pages have to contain page content, some pages are merely gateways to other pages where you'll (hopefully) find what you are looking for. Most leaf pages (in your content tree) are heavy on page content.

Mind that page content goes beyond mere text, images or label/value pairs. A contact form also belongs to the page content as it is a clear, valuable user action.

redirects

Redirects are all elements on a page that aim to pull you away from the page you're currently looking at. Rather than actual content, these elements offer you gateways to other content that can be found on the web (as a whole), on your own particular website or even on the very page they're on.

Ads and navigation belong to this category, but also shortlists (fe latest news) and search boxes. When analyzing a site, you'll find that this is often the largest group of abstracts you have to deal with.

just another useless categorization?

What's particularly interesting about this way of categorizing things is that it allows you to separate fluff and noise from the actual content your site is based upon. If you would somehow succeed in getting this categorization in your html code, a program could run through your site and extract all your unique content, skipping navigation, ads, shortlists and other duplicate content.

This is not an alien idea, there are already programs out there (Safari Reader for example) that try to do this. Sadly these programs only base their output on vague assumptions and guesses, not able to guarantee a proper result. This is because they lack the proper hook in html to make valuable assumptions.

conclusion

While these abstracts will probably not be reflected in your final html code anytime soon, they will still reveal some interesting subtleties. Before I used to think that a search box and contact form were closely related components, now I feel that a search box is a clear redirect, meaning it's closer related to the group of navigational components (though I firmly believe a search box doesn't actually belong to the category of navigation).

This article was written from a html perspective, but obviously these categorizations have a much broader impact than html alone. In time they might influence the way you structure you css and javascript (a direct result of restructuring your html), needless to say they are also useful when starting to wireframe a new site. They might help you make better judgments when deciding what elements to remove or switch around when taking a mobile-first, responsive approach, or they might help you balance your pages, making sure you have enough unique content on offer.

All this just because I was planning to write an article on a special category of redirect components, but I'll leave that for next time.