Archive for April, 2009

Biggest challenges for creating semantic web

Monday, April 13th, 2009

(Non-)adoption of one universal standard

One of the main problems with semantic web is the fact that standards are not developed enough.

We currently have a problem of next generation HTML being either XHTML2.0 or HTML5 (or both!), which would lead to a more technologically disparate user interface implementations across the web.

In very short period of time from now we may have interfaces written in HTML4.01, HTML5, XHTML1.0 (Loose, Transitional and Strict) and XHTML2.0.

This is already enough ‘stew’ of code, which is going to be very difficult for browsers to render.

It’s a common case today to be coding up a contact page, written in standards compliant XHTML1.0 Strict, which needs to include a Google Map, which is written in non-standards compliant HTML4.01.

The end result is a mish-mash solution which really makes no sense in terms of standards and long term validity.

Machine readability

The aim of the systems is to automate the mundane tasks.

This can only be achieved by using data structures which are easily machine readable and which developers can easily utilise to make them machine readable.

Automation of mundane tasks challenge is no longer about generating reports from a database once a month, but working out complex connections between people, phrases, terms, web pages and so on.

The systems which automate these tasks successfully (like Google and FaceBook) usually end up being very successful, highly used and profitable.

They are also incredibly scalable and very useful for the end user – imagine for example what the whole web experience would be like if Google did not exist.

Human readability

Human readibility is equally important as standards really ought to be usable by every day, non-technical individuals.

This is one of the main reasons why HTML has become so incredibly popular and therefore the whole concept of Internet has taken off to such a great extent.

With emergence of many, different standards this readability and learnability of implemented user interface solutions becomes much smaller and therefore it essentially stifles further development of Internet.

Emerging versions of technologies and standards are increasingly running into problems of human non-readibility and lower usability with each iteration.

More standards exist, the less integrated semantic web becomes and more there is for developers to learn.

If an average developer needs to know for example: 3 CSSs, PHP, XHTML, HTML5, MySQL, CSS and JavaScript with JQuery AJAX in order to implement a modern User Interface, it is clear and obvious that there are going to be fewer peoplle developing innovative solutions compared to when there was only HTML and CSS to worry about.

Standards authorities

There are a number of problems regarding the standards authorities on the web.

Generally speaking, standards authorities fall into the following groups:

  1. Socialists: Human usability proponents who are usually not highly technical or not technical at all
  2. Developers: Machine readbility proponents who are usually very technical, but have little or no considerations for usability and user interaction needs and requirements
  3. Corporations: Large corporations, who inevitably want to create ‘standards’ which only benefit their bottom lines (i.e. Apple, Google, Yahoo, Microsoft, IBM, etc.)

Each one of these groups have an important role to play in development of future and current standards, but in practice they do not collaborate enough, hence the standards tend to edge towards the benefit of one of those groups only and not everyone.

The above groups have the following impact on standards development:

  1. Socialists: Promote best end user experience and most collaborative and easy to use solutions
  2. Developers: Promote easiest and most flexible technologies to work with which offer viable solutions to solving complex problems of all sorts
  3. Corporations: Impose standars by deploying them across their massive platforms (i.e. FaceBook’s 200M users is a good start towards deploying any standard on the web)

The only way proper standards can be achieved is by all groups collaborating together constantly in order to achieve standards, while in practice this does not happen nowhere near as enough as it should be.

The end result is more ‘broken’ than ‘fixed’ web, with group of people developing for different platforms, depending on what they have decided to support and most of the time never really supporting everything that should be supported.

Scope for semantic expression

Monday, April 6th, 2009

It has been a subject of many discussions I have had with various average developers that current technologies do not allow enough room for ‘semantic expression’.

People will complain that we do not have enough HTML tags in order to mark up everything that we need to build on web sites we work on.

People propose new ideas in specifications such as HTML, which are narrowed down too specifically onto web site types (see Web site types), forgetting that the web was built upon much more generic underpinning standards and principles.

Previously I wrote about Pareto Analysis of HTML and Pareto Analysis of CSS, both of which illustrated the fact that most web site types only require around 20% of the available technologies in order to be successfully implemented.

If HTML specifications were insufficient for proper semantic expression, we would be using 100% of HTML tags all the time and would be struggling to express the rest due to short falling of the specification.

In reality, we are often not using, or in worse cases abusing, the specification and standards and therefore are unable to semantically express ourselves properly.

Here are some of the building blocks which, within the mainstream web technologies, allow us to express the meaning of information we are marking up on regular basis.

Meta data

Meta data is one of the least utilised aspects of any web page and one of the most misunderstood ones by everyone, including developers, users and search engines (who have arguably killed meta data purpose even more by surrendering to the fact that developer bad practices were the necessary fact of life).

The purpose of meta data is to describe the purpose and meaning of a given web page as a whole.

If the search engines like Google developed their page rank algorithm in such a way as to reward those systems which have good and properly implemented meta data standards, the web would be of much higher quality.

Meta data implementation is simple and straight forward, as well as supported by good CMS solutions.

With further extensions to meta data standards (see Dublin Core) it would be possible to implement any document exchange mechanism across various web systems, which would be enormously powerful leverage for semantic web.

CSS class names

Class names are also misunderstood by developers.

One of the main cornerstones of Object Oriented development is the notion that a class name becomes a living part of the system.

A reusable piece of code named ‘str2int’ for example, although short and concise isn’t necessarily easy to understand, remember and work with if it becomes widely accepted and reused.

Similarly naming CSS classes as ‘block’, ‘left’, ‘morePadding’ or ‘left_margin’ does not achieve longer term acceptability of name and does not enhance the overall understandability of the solution.

Class names like ‘product’, ‘hCard’, ‘organisation’ or ‘component’ are much more useful and can be made part of any system and reused over and over just as though they were part of the official HTML specification.

This is in line with general best practices of Object Oriented Software Development and makes both machine and human sense, for short, medium and long term.

HTML tags

HTML tags themselves are highly semantic and serve a very good purpose for web documents.

Developers would often abuse the tags, using paragraphs for layout, failing to mark up all lists properly, using <fieldset> for the purposes which <div>s are intended for and so on.

These abuses of specification inevitably lead towards bad semantics, lack of standards compliance and ‘lack of tags’ perception, even though most problems are more than resolvable through utilising proper coding principles.

HTML structure

Sometimes it is, by no means, enough to use one or two HTML tags in order to mark up a more complex piece of data.

In many instances we have situations where a paragraph of text contains a few acronyms which needs to be additionally marked up.

This is what HTML is all about, allowing us to combine atomic level tags into complex data structures.

Creating an XHTML page is essentially no different to creating an XML data structure.

It ought to be completely reusable at all levels and make as much sense as possible in and out of context.

This is the ultimate goal of excellent information architecture at technical and non-technical levels.

URL structure

Often very overlooked aspect of semantic expression is the very notion of the URL structure.

A link like www.flexewebs.com/p.php?id=12 does not mean anything, while www.flexewebs.com/about-us makes much more sense to humans and computers.

URL structures could be heavily standardised (see Cool URIs for the Semantic Web) and could help machines and humans navigate the web easier, faster and better.

URL structures are also incredibly powerful means of creating the overall web system architecture, which Flickr team utilised in order to create their system – they started the overall design from the URL designs.

XML

XHTML is a subset of XML and that’s the main reason why XHTML should be preferred technology choice over XHTML.

XHTML provides systems with a practical ability to offer ‘user interfaces as APIs’, where any system should be able to harvest data from a well constructed XHTML web system and reuse that information elsewhere if needed.

Knowing and understanding best practices in XML is therefore absolutely critical to creating great quality , scalable, semantic web user interfaces, which retain their value over a long period of time.

Content

Last but not least, the very content of the given widget is the cream on the cake of semantic expression.

After we have marked up our widget with all the necessary structural markup, the content which is contains gives the widget the final meaning and purpose.

Writing good and relevant content for web should therefore also be considered as an important part of building semantic web user interfaces.

The content ought make as much sense when taken out of context as well as the code and achieving that gives the final solution the real power.

Pareto analysis of CSS

Friday, April 3rd, 2009

Highlighted below are the most widely used CSS declarations.

These are used mostly because most of them are supported in all browsers, they are quickest and shortest to write and make most sense in terms of design of usable and accessible interfaces.

Once again, Pareto’s analysis exemplifies itself in a similar way to how it exemplified itself with HMTL (see Pareto analysis of HTML) which implies that CSS implementation is possibly more extensive than it needs to be for most web site types.

  1. :active
  2. :after
  3. :before
  4. :first-child
  5. :first-letter
  6. :first-line
  7. :focus
  8. :hover
  9. :lang
  10. :link
  11. :visited
  12. background
  13. background-attachment
  14. background-color
  15. background-image
  16. background-position
  17. background-repeat
  18. background-repeat
  19. border
  20. border-bottom
  21. border-bottom-color
  22. border-bottom-style
  23. border-bottom-width
  24. border-collapse
  25. border-color
  26. border-left
  27. border-left-color
  28. border-left-style
  29. border-left-width
  30. border-right
  31. border-right-color
  32. border-right-style
  33. border-right-width
  34. border-spacing
  35. border-style
  36. border-top
  37. border-top-color
  38. border-top-style
  39. border-top-width
  40. border-width
  41. bottom
  42. caption-side
  43. clip
  44. color
  45. counter-increment
  46. counter-reset
  47. cursor
  48. direction
  49. display
  50. empty-cells
  51. float
  52. font
  53. font-family
  54. font-size
  55. font-size-adjust
  56. font-stretch
  57. font-style
  58. font-variant
  59. font-weight
  60. left
  61. letter-spacing
  62. line-height
  63. list-style
  64. list-style-image
  65. list-style-position
  66. list-style-type
  67. margin
  68. margin-bottom
  69. margin-left
  70. margin-right
  71. margin-top
  72. max-height
  73. max-width
  74. min-height
  75. min-width
  76. outline
  77. outline-color
  78. outline-style
  79. outline-width
  80. overflow
  81. padding
  82. padding-bottom
  83. padding-left
  84. padding-right
  85. padding-top
  86. position
  87. right
  88. table-layout
  89. text-align
  90. text-decoration
  91. text-indent
  92. text-transform
  93. top
  94. visibility
  95. white-space
  96. word-spacing
  97. z-index

The use of above highlighted declarations is most common based on the projects I have worked on so far (small and large) and various analysis I have done on other web sites in the wild.

It does not mean that you are likely to use all of these declarations on all web sites.

It is also the case that some declarations are used for, essentially, wrong purposes. A great example is the use of line-height in order to position elements vertically, which, over time, can end up being counter-productive approach on larger web sites.

Nevertheless, line-height is a worth while declaration to utilise, for example, in order to make text easier to read on the screen as it is generally agreed that some extra spacing between the lines of text makes it easier to read off the screen.