Thursday 8 October 2009

AJAX content made discoverable...soon

I follow the Official Google Webmaster Central Blog. It can be an interesting read at times, but on other occasions it provides humdrum information on how best to optimise a website, or answers questions which most of us know the answers to already (e.g. recently we had, 'Does page metadata influence Google page rankings?'). However, the latest posting is one of the exceptions. Google have just announced that they are proposing a new standard to make AJAX-based websites indexable and, by extension, discoverable to users. Good ho!

The advent of Web 2.0 has brought about a huge increase in interactive websites and dynamic page content, much of which has been delivered using AJAX ('Asynchronous JavaScript and XML', not a popular household cleaner!). AJAX is great and furnished me with my iGoogle page years ago; but increasingly websites use it to deliver page content which might otherwise be delivered using static web pages in XHTML. This presents a big problem for search engines because AJAX is currently un-indexable (if this is a word!) and a lot of content is therefore invisible to all search engines. Indeed, the latest web design mantra has been "don't publish in AJAX if you want your website to be visible". (There are also accessibility and usability issues, but these are an aside for this posting...)

The Webmaster Blog summarises:
"While AJAX-based websites are popular with users, search engines traditionally are not able to access any of the content on them. The last time we checked, almost 70% of the websites we know about use JavaScript in some form or another. Of course, most of that JavaScript is not AJAX, but the better that search engines could crawl and index AJAX, the more that developers could add richer features to their websites and still show up in search engines."
Google's proposal involves shifting the responsibility of indexing the website to the administrator/webmaster of the website, whose responsibility it would be to set up a headless browser on the web server. (A headless browser is essentially a browser without a user interface; a piece of software that can access web documents but does not deliver them to human users). The headless browser would then be used to programmatically access the AJAX website on the server and provide an HTML 'snap shot' to search engines when they request it - which is a clever idea. The crux of Google's proposal is a suite of URL protocols. These would control when the search engine knows to request the headless browser information (i.e. HTML snapshot) and which URL to reveal to human users.

It's good that Google are taking the initiative; my only concern is that they start trying to re-write standards, as they have a little with RDFa. Their slides are below - enjoy!

No comments:

Post a Comment