Posts Tagged ‘specification’

Pareto analysis of HTML

Wednesday, June 4th, 2008

HTML stands for Hyper Text Markup Language. It is a language and it is intended mainly for text. Its initial purpose was to provide meaning to text documents which reside on the World Wide Web.

This initial intention of HTML got somewhat lost in the last 10 years of web development and large majority of web developers are only now waking up to this fact.

I would like to provide Pareto Analysis of HTML as it is used on semantic web development projects on day-to-day basis. Without fully understanding these tags, you are most likely not to be able to create a proper, semantic web site today.

There are around 100 HTML elements which have been made part of the W3C specification. However, only a portion of those tend to be used by developers on day-to-day basis.

Highlighted in the list are those elements which I deem to contain some amount of semantic ambiguity. In other words, developers tends to misunderstand the meaning behind these tags, which tends to lead to their lesser or greater misuse.

I will explain separately the reasons why I think these tags are misused and how they ought to be used.

  1. <div>
  2. <span>
  3. <p>
  4. <a>
  5. <table>
  6. <tr>
  7. <thead>
  8. <tbody>
  9. <td>
  10. <th>
  11. <caption>
  12. <img>
  13. <form>
  14. <input>
  15. <label>
  16. <fieldset>
  17. <textarea>
  18. <select>
  19. <style>
  20. <option>
  21. <legend>
  22. <html>
  23. <head>
  24. <title>
  25. <link>
  26. <meta>
  27. <script>
  28. <body>
  29. <em>
  30. <strong>
  31. <abbr>
  32. <address>
  33. <h1>
  34. <h2>
  35. <h3>
  36. <ul>
  37. <ol>
  38. <li>
  39. <dl>
  40. <dd>
  41. <dt>
  42. <object>
  43. <embed>

With the above list of HTML elements you should be able to semantically mark up pretty much 90% of any small or large web site today.

It is interesting to observe that less than half of the overall HTML specification is used on most web sites today.

This fact suggests that almost half of HTML tag set is obsolete as well as that HTML specification is arguably sufficient for the purposes it was developed for.

Some web developers may disagree with this statement, feeling that HTML is missing elements which could be introduced into HTML5/XHTML2 to make some common tasks easier and faster to complete.

However, in conversations with developers, I have rarely met one who was able to suggest any new tag, letalone one that was meaningful and did not only cover requirements of an average blog site.