Pareto analysis of HTML

HTML stands for Hyper Text Markup Language. It is a language and it is intended mainly for text. Its initial purpose was to provide meaning to text documents which reside on the World Wide Web.

This initial intention of HTML got somewhat lost in the last 10 years of web development and large majority of web developers are only now waking up to this fact.

I would like to provide Pareto Analysis of HTML as it is used on semantic web development projects on day-to-day basis. Without fully understanding these tags, you are most likely not to be able to create a proper, semantic web site today.

There are around 100 HTML elements which have been made part of the W3C specification. However, only a portion of those tend to be used by developers on day-to-day basis.

Highlighted in the list are those elements which I deem to contain some amount of semantic ambiguity. In other words, developers tends to misunderstand the meaning behind these tags, which tends to lead to their lesser or greater misuse.

I will explain separately the reasons why I think these tags are misused and how they ought to be used.

  1. <div>
  2. <span>
  3. <p>
  4. <a>
  5. <table>
  6. <tr>
  7. <thead>
  8. <tbody>
  9. <td>
  10. <th>
  11. <caption>
  12. <img>
  13. <form>
  14. <input>
  15. <label>
  16. <fieldset>
  17. <textarea>
  18. <select>
  19. <style>
  20. <option>
  21. <legend>
  22. <html>
  23. <head>
  24. <title>
  25. <link>
  26. <meta>
  27. <script>
  28. <body>
  29. <em>
  30. <strong>
  31. <abbr>
  32. <address>
  33. <h1>
  34. <h2>
  35. <h3>
  36. <ul>
  37. <ol>
  38. <li>
  39. <dl>
  40. <dd>
  41. <dt>
  42. <object>
  43. <embed>

With the above list of HTML elements you should be able to semantically mark up pretty much 90% of any small or large web site today.

It is interesting to observe that less than half of the overall HTML specification is used on most web sites today.

This fact suggests that almost half of HTML tag set is obsolete as well as that HTML specification is arguably sufficient for the purposes it was developed for.

Some web developers may disagree with this statement, feeling that HTML is missing elements which could be introduced into HTML5/XHTML2 to make some common tasks easier and faster to complete.

However, in conversations with developers, I have rarely met one who was able to suggest any new tag, letalone one that was meaningful and did not only cover requirements of an average blog site.

Written by Jason Grant, BSc, MSc on 4th June 2008

Tags: ,

7 Responses - Join the conversation

  1. That’s true, Jason. My divs, spans, fieldsets, sometimes misused. The confusing abbr and acronym on old Internet Explorer.

    Jason, I think you need the tabindex on the author name of comment form.

    dani on 7th April
  2. Thanks Dani for spotting the tabindex issue with the form.

    I will be changing this design (it currently one of the themes for Word Press) and will be fixing various issues when I get round to that task.

    Thanks for signing up and hope you will contribute on regular basis.

    Jason Grant on 7th April
  3. Tab index issue has now been fixed on this blog.

    Jason Grant on 15th April
  4. Jason,
    may I ask you to use subscribe to comments feature/plugin on this great blog?

    And are there any plan of this blog against W3C semantics extractor to make it more semantics in <div> use?

    dani on 20th April
  5. Dani,

    Added the subscribe to comments plugin now. Hope it works.

    Regarding the actual blog design and code of this blog I cannot account for it as it is an off-the-shelf template I am using here (hence the mention of the original author at the bottom).

    I don’t really like it (look and code), so I will be changing it at some point soon.

    Jason Grant on 20th April
  6. Jason,
    Lately, I found that tabindex is not a must on daily basis. Except a complex form.
    I think it would be more usable if you linking some of your related posts (semantic use of those elements: div, label, h1-h6, p, span, etc; just like in your contents page) here.

    And for the usable flow, the notify-comment-by-mail line should be placed before the submit button, right?

    dani on 6th October

Contribute your expertise