Archive for the ‘Overview’ Category

Biggest challenges for creating semantic web

Monday, April 13th, 2009

(Non-)adoption of one universal standard

One of the main problems with semantic web is the fact that standards are not developed enough.

We currently have a problem of next generation HTML being either XHTML2.0 or HTML5 (or both!), which would lead to a more technologically disparate user interface implementations across the web.

In very short period of time from now we may have interfaces written in HTML4.01, HTML5, XHTML1.0 (Loose, Transitional and Strict) and XHTML2.0.

This is already enough ‘stew’ of code, which is going to be very difficult for browsers to render.

It’s a common case today to be coding up a contact page, written in standards compliant XHTML1.0 Strict, which needs to include a Google Map, which is written in non-standards compliant HTML4.01.

The end result is a mish-mash solution which really makes no sense in terms of standards and long term validity.

Machine readability

The aim of the systems is to automate the mundane tasks.

This can only be achieved by using data structures which are easily machine readable and which developers can easily utilise to make them machine readable.

Automation of mundane tasks challenge is no longer about generating reports from a database once a month, but working out complex connections between people, phrases, terms, web pages and so on.

The systems which automate these tasks successfully (like Google and FaceBook) usually end up being very successful, highly used and profitable.

They are also incredibly scalable and very useful for the end user – imagine for example what the whole web experience would be like if Google did not exist.

Human readability

Human readibility is equally important as standards really ought to be usable by every day, non-technical individuals.

This is one of the main reasons why HTML has become so incredibly popular and therefore the whole concept of Internet has taken off to such a great extent.

With emergence of many, different standards this readability and learnability of implemented user interface solutions becomes much smaller and therefore it essentially stifles further development of Internet.

Emerging versions of technologies and standards are increasingly running into problems of human non-readibility and lower usability with each iteration.

More standards exist, the less integrated semantic web becomes and more there is for developers to learn.

If an average developer needs to know for example: 3 CSSs, PHP, XHTML, HTML5, MySQL, CSS and JavaScript with JQuery AJAX in order to implement a modern User Interface, it is clear and obvious that there are going to be fewer peoplle developing innovative solutions compared to when there was only HTML and CSS to worry about.

Standards authorities

There are a number of problems regarding the standards authorities on the web.

Generally speaking, standards authorities fall into the following groups:

  1. Socialists: Human usability proponents who are usually not highly technical or not technical at all
  2. Developers: Machine readbility proponents who are usually very technical, but have little or no considerations for usability and user interaction needs and requirements
  3. Corporations: Large corporations, who inevitably want to create ‘standards’ which only benefit their bottom lines (i.e. Apple, Google, Yahoo, Microsoft, IBM, etc.)

Each one of these groups have an important role to play in development of future and current standards, but in practice they do not collaborate enough, hence the standards tend to edge towards the benefit of one of those groups only and not everyone.

The above groups have the following impact on standards development:

  1. Socialists: Promote best end user experience and most collaborative and easy to use solutions
  2. Developers: Promote easiest and most flexible technologies to work with which offer viable solutions to solving complex problems of all sorts
  3. Corporations: Impose standars by deploying them across their massive platforms (i.e. FaceBook’s 200M users is a good start towards deploying any standard on the web)

The only way proper standards can be achieved is by all groups collaborating together constantly in order to achieve standards, while in practice this does not happen nowhere near as enough as it should be.

The end result is more ‘broken’ than ‘fixed’ web, with group of people developing for different platforms, depending on what they have decided to support and most of the time never really supporting everything that should be supported.

Scope for semantic expression

Monday, April 6th, 2009

It has been a subject of many discussions I have had with various average developers that current technologies do not allow enough room for ‘semantic expression’.

People will complain that we do not have enough HTML tags in order to mark up everything that we need to build on web sites we work on.

People propose new ideas in specifications such as HTML, which are narrowed down too specifically onto web site types (see Web site types), forgetting that the web was built upon much more generic underpinning standards and principles.

Previously I wrote about Pareto Analysis of HTML and Pareto Analysis of CSS, both of which illustrated the fact that most web site types only require around 20% of the available technologies in order to be successfully implemented.

If HTML specifications were insufficient for proper semantic expression, we would be using 100% of HTML tags all the time and would be struggling to express the rest due to short falling of the specification.

In reality, we are often not using, or in worse cases abusing, the specification and standards and therefore are unable to semantically express ourselves properly.

Here are some of the building blocks which, within the mainstream web technologies, allow us to express the meaning of information we are marking up on regular basis.

Meta data

Meta data is one of the least utilised aspects of any web page and one of the most misunderstood ones by everyone, including developers, users and search engines (who have arguably killed meta data purpose even more by surrendering to the fact that developer bad practices were the necessary fact of life).

The purpose of meta data is to describe the purpose and meaning of a given web page as a whole.

If the search engines like Google developed their page rank algorithm in such a way as to reward those systems which have good and properly implemented meta data standards, the web would be of much higher quality.

Meta data implementation is simple and straight forward, as well as supported by good CMS solutions.

With further extensions to meta data standards (see Dublin Core) it would be possible to implement any document exchange mechanism across various web systems, which would be enormously powerful leverage for semantic web.

CSS class names

Class names are also misunderstood by developers.

One of the main cornerstones of Object Oriented development is the notion that a class name becomes a living part of the system.

A reusable piece of code named ‘str2int’ for example, although short and concise isn’t necessarily easy to understand, remember and work with if it becomes widely accepted and reused.

Similarly naming CSS classes as ‘block’, ‘left’, ‘morePadding’ or ‘left_margin’ does not achieve longer term acceptability of name and does not enhance the overall understandability of the solution.

Class names like ‘product’, ‘hCard’, ‘organisation’ or ‘component’ are much more useful and can be made part of any system and reused over and over just as though they were part of the official HTML specification.

This is in line with general best practices of Object Oriented Software Development and makes both machine and human sense, for short, medium and long term.

HTML tags

HTML tags themselves are highly semantic and serve a very good purpose for web documents.

Developers would often abuse the tags, using paragraphs for layout, failing to mark up all lists properly, using <fieldset> for the purposes which <div>s are intended for and so on.

These abuses of specification inevitably lead towards bad semantics, lack of standards compliance and ‘lack of tags’ perception, even though most problems are more than resolvable through utilising proper coding principles.

HTML structure

Sometimes it is, by no means, enough to use one or two HTML tags in order to mark up a more complex piece of data.

In many instances we have situations where a paragraph of text contains a few acronyms which needs to be additionally marked up.

This is what HTML is all about, allowing us to combine atomic level tags into complex data structures.

Creating an XHTML page is essentially no different to creating an XML data structure.

It ought to be completely reusable at all levels and make as much sense as possible in and out of context.

This is the ultimate goal of excellent information architecture at technical and non-technical levels.

URL structure

Often very overlooked aspect of semantic expression is the very notion of the URL structure.

A link like www.flexewebs.com/p.php?id=12 does not mean anything, while www.flexewebs.com/about-us makes much more sense to humans and computers.

URL structures could be heavily standardised (see Cool URIs for the Semantic Web) and could help machines and humans navigate the web easier, faster and better.

URL structures are also incredibly powerful means of creating the overall web system architecture, which Flickr team utilised in order to create their system – they started the overall design from the URL designs.

XML

XHTML is a subset of XML and that’s the main reason why XHTML should be preferred technology choice over XHTML.

XHTML provides systems with a practical ability to offer ‘user interfaces as APIs’, where any system should be able to harvest data from a well constructed XHTML web system and reuse that information elsewhere if needed.

Knowing and understanding best practices in XML is therefore absolutely critical to creating great quality , scalable, semantic web user interfaces, which retain their value over a long period of time.

Content

Last but not least, the very content of the given widget is the cream on the cake of semantic expression.

After we have marked up our widget with all the necessary structural markup, the content which is contains gives the widget the final meaning and purpose.

Writing good and relevant content for web should therefore also be considered as an important part of building semantic web user interfaces.

The content ought make as much sense when taken out of context as well as the code and achieving that gives the final solution the real power.

Arguments for implementing semantic interfaces

Wednesday, May 28th, 2008

Further to arguments against semantics, there are also viable and strong value-adding arguments for semantics. These are:

Semantics give meaning to interfaces

This is by far the most important argument for semantics.

Using semantic mark-up on web pages gives web pages meaning!

Many web developers today still misunderstand the basic definition of semantics and can tend towards thinking that it is somehow about ‘the look of a page with style sheets turned off’, but, in fact, semantics are all about meaning of web pages.

The ‘meaning’ is important for both users (end users and developers of interfaces) as well as machines (which are just another form of users) such as search engine robots and (in future) various semantic interface harnessing tools and languages.

Semantics add value to interfaces

Semantics add value for all types of users.

Chances are not all users of a web site have a big screens, a mouse, a powerful PC, know what they are doing and see very well.

In fact, it is 100% certain that your web site has at least 1 very disabled user, but one that plays (perhaps) the most important role in your web site – Google search robot (also known as indexer).

Semantics aid usability and scalability

Properly developed semantic interfaces are very easy to scale up, re-style and re-organise (all positive and preferable features of any IT system).

Usability of semantic web sites is usually present both from end user perspective as well as from web site developer perspective.

Usability usually saves and earns more money. One example would be the fact that an easy to understand interface is easier for maintenance and upgrading, which saves developer time, which in turn saves money for an organisation.

As an example of usability feature presented by semantic interface would be ability to call a telephone number on a web page directly from Skype, if that telephone number has been marked up as a micro format.

One click and the end user is directly in a telephone conversation with your company’s sales department, generating business for your organisation.

Increases ROI by increasing findability and re-findability

One of the most important aspects of any web sites is that it needs to be findable and re-findable.

Semantic interfaces by their very nature and the fact that they are standards compliant tend to be naturally easier to find, analyse, understand and index for search robots.

The very fact that your interfaces are standards compliant tell search robots that you know what you are doing when it comes to web development and search engines tend to treat this factor very favourably.

In future it may be the case that only standards compliant web sites will be considered ‘valid’ solutions for showing up in search engine result pages.

The very fact that (for example) your e-commerce web site can be easily found will mean that you will obtain more visitors to your site, which ought to convert into more sales for your business.

This way, semantic interfaces become direct mechanisms through which ROI of a business can be increased, without implementing anything other than the standards which have been set out for us in order to make our lives simpler and faster, while increasing quality.

Semantics enable data sharing between applications

W3C standards are not enough in order to provide semantic interfaces.

Semantic interfaces are extensions to W3C standards and can provide much more meaning than just POSH (Plain Old Semantic HTML).

With emergence of de-facto standards like micro formats, it has become possible to ‘share’ data from one site with another site without much problems.

RSS feeds are also there to aid similar functionality, but micro formats create ‘interfaces as APIs’, meaning that any machine from anywhere could harvest useful data from your web site (such as your product offers) and make it available in the relevant section of their web site.

Excellent example is Google’s shopping section, which data mines product offers from various web sites on the web and then offers a comparison functionality to enable users of Google search engine to find the best deals across the web and shop from the related web sites.

Semantic interfaces can directly enable this type of functionality and is the reason why web sites like Amazon and Argos are the two most popular e-commerce web sites in UK.

Semantics enable leveraging of social networks

Apart from becoming more meaningful, the web is becoming more ‘personal’ and catered towards group of people with similar interests.

Social networks play a key role in creating these on-line communities and semantic mark-up is, once again, at the heart of enabling development and utilisation of meaningful social networks that work.

Micro formats, RSS and Atom feeds as well as RDF are just some of the examples of technologies which are making this functionality possible.

Less error prone interfaces

For a long time now I have believed in creating web sites without using browser specific hacks.

My latest development of Flexewebs CMS is a good example of how a nice web site can be created using valid (in this case XHTML1.0 Strict) HTML code and simple CSS rules in order to create a scalable and cross browser fast performing interface without any hacks.

Using XHTML1.0 Strict with W3C standards compliant mark up has always made my development life easier and created interfaces which are much easier to control using the usual development tools we have available to ourselves.

More ‘meaningful’ interfaces

Semantic interfaces are also able to communicate a corporate message in a much more meaningful manner than non-semantic interfaces.

Sharing your company’s contact details (for example) using micro formats is a much better approach than doing it in a proprietary, non-standard manner which cannot be easily found by various directories and search engines.

Arguments against semantic interfaces

Tuesday, May 27th, 2008

In our day-to-day work on creating good quality web interfaces, we inevitably come across various arguments against using semantic and proper markup in order to develop interfaces. Some of these arguments may be one or more of the following:

Semantic interfaces will not make more money for the organisation

There are many companies out there that do not know what semantics are all about. After all semantics are only now being taught at Universities.

From that lack of understanding stems the lack of belief that semantic interfaces will add more value to business and therefore make more money for the organisation.

It could be almost impossible bringing a client up-to-date with what is currently going on with Internet and where it is all heading towards.

Usually one of the best ways to show someone value of semantics is to demonstrate a working example of where they have made a difference, but often this does not work as clients can be sceptical about whether the same or similar approach would work for their business.

Semantics may only benefit Google and Yahoo

Some companies who are a little more insightful about web technologies, various activities taking place between various ‘big players’ (Google, Yahoo, Microsoft) may say that adding semantics to web pages is all about helping various data mining technologies do a quicker and better job of ‘stealing’ data from various web sites and storing them in a very structured way into their own databases.

Later on, these big players will (one way or another) be making money through re-representing that data to their users in a way which, in the long term, may take users away from the initial web site where the data originated from.

This argument could potentially prove to be a very good one, but let’s wait and see what happens in the future.

An example here would be the situation of what happened with Google News, where many publishers were annoyed by the fact that their content was being harvested by Google to show a list of latest news headlines.

Publishers of news were annoyed about the fact that Google was comapring their content with competing newspapers’ content, as well as potentially stealing their advertising revenue, based on material which the newspapers wrote themselves in the first place.

Semantics are a (very) (fast) moving target

One day a micro format may look one way – another day it may change. Some clients may perceive things to be moving forward very fast, but in reality we all know this is not really the case.

If anything, things are moving too slowly forward (look at what happened with HTML5 vs. XHTML2.0 situation).

Clients may feel that constant changes which affect web user interfaces mean that ‘waiting for later’ in order to implement new features is a good idea. In reality this often proves to be a matter which loses the given company a competitive advantage.

Semantics are not really a standard

Semantics are at best a ‘tribally’ accepted set of ‘common practices’ and semantic developers disagree heavily amongst each other about what really constitutes proper semantic interfaces.

There is no be-all and end-all document which outlines all the possible situations which web developers could find themselves in from project to project. What should standard implementations then be?!

Yahoo may come up with some common patterns to use when developing for web, but may of these patterns in practice show to be an overkill or not fitting the purposes for whatever reason. Yahoo’s patterns are often suitable for ‘mega-sites’ like Yahoo itself, but not necessarily for smaller and medium sites.

Many companies simply do not care about semantics on their web sites

For most companies the age old approach of ‘just get it done’ still rules the business.

Nike made billions out of it by rephrasing that into ‘Just do it’, but in IT domain this approach is incredibly dangerous and creates solutions which are virtually impossible to maintain even in the short term and take many times longer to develop in the first place, with many more people than really needed.

This is arguably one of the biggest problems web developers may come across today when trying to implement proper solutions.

Bottom line is that clients care about money and time and web developers (wrongly) jump to conclusions that implementing semantic and standards compliant solutions will take more money and more time for the client. In fact, in practice it ought to take less time and money! All it requires is a an appropriate enough approach.

Current situation and value of semantics

Tuesday, May 27th, 2008

Just like anything in life, semantics, and their impact, can be evaluated within a business and it is possible to work out whether they are ‘worth it’ in terms of implementation.

They will be worth it if they add more value than they consume in order to develop.

I suppose one of the most ‘expensive’ matters in semantic interfaces can be obtaining a bad quality ‘semantic’ web site solution, which does not meet standards, expectations and it’s purpose.

I would estimate that within Greater London, United Kingdom, there exist around 1000 active truly semantic web developers, most of which are constantly tied up in roles working on various big web sites.

The question is then: ‘what do companies which do not have access to these developers do in order to create proper semantic interfaces?’. The answer is: ‘they obtain solutions which do not really comply with proper guidelines’.

Reality of life seems to be that most developers are either incapable of learning semantics properly, or simply do not care about semantics.

Both circumstances are bad for companies which are looking to obtain good quality, semantic solutions in order to reap the rewards Web 2.0 can bring to them.

It is also true that companies impose unrealistic deadlines on development of semantic interfaces, which are either impossible to meet or can only be met partially.

This is counter productive for everyone, as it creates a culture of ‘not caring’ amongst developers, while companies tend to blame developers for being incapable of creating solutions which work.

I am also acutely aware of the fact that search engines are valuing semantic features on interfaces much more than before and are awarding those solutions which are easily recognised by automated tools as containing set pieces of data (such as contact details, addresses, Geo locations, events, etc.)

Many companies also struggle with implementing pixel perfect cross browser solutions, spending many hours on them, while overlooking much more important semantic related factors of their interfaces.

SEO consultancy is implemented at the very end of the process, as opposed to from the very start.

SEO unaware developers work on creating solutions for which they believe are correct and will get well optimised, something that usually results in no Google rank whatsoever, as putting an h1 around logo is seen as ‘good practice’ as ‘logo is the most important aspect of every page’, which is, of course, so blatantly not true that it is arguably not even worth discussing.

Standards compliance

Monday, May 26th, 2008

In many discussions with various developers I have heard the term ‘standards compliance’ used very loosely to, usually, denote the idea that (X)HTML code of web pages on a given web site validate against a W3C validator of some sort.

Unfortunately, I would not consider simply just a W3C (X)HTML compliant web site in any way standards compliant today at all. At best, (X)HTML code validity is a fairly good first step towards achieving ‘standards compliance’.

Here is a quick insight into different grades of standards compliance which I would consider as required, relevant and highly recommended for implementation on every web based project.

Good standards compliance

Better standards compliance

Much better

  • Include all the above steps, and also
  • Use microformats wherever possible
  • Ensure at least AA accessibility compliance
  • Ensure graceful degradation of the code (especially JavaScript)
  • Make sure your pages are below 100Kb in size by all means

Best

  • All of the above, but also
  • Make sure the web site is usable (not just accessible) within a screen reader, which should also mean that the web site is easily usable for normal users without use of a mouse
  • Create site architecture which is SEO ready from the outset (including pretty URLs)
  • Ensure compatibility with future browsers/clients (mobile phones, playstations and small laptops)
  • Ensure that the overall solution is very consistently implemented, including common approach to component coding, clear and concise reusable site portions, etc.
  • Ensure (re)use of commonly recognised and well adopted UI design patterns
  • Code with future compatibility in mind, so as to ensure forward compatibility with emerging technologies and new versions of web browsers

The above check list is by no means definite, final or full, but it is a good starting check point to ensure that the end product of a web site is of high quality and fit for the web of today and evenly reaches all types of users hitting your web site day to day.