Information Strategy Group, LJMU: June 2009

Friday 26 June 2009

Read all about it: interesting contributions at ISKO-UK 2009

I had the pleasure of attending the ISKO-UK 2009 conference earlier this week at University College London (UCL), organised in association with the Department of Inf ormation Studies. This was my first visit to the home of the architect of Utilitaria nism, J eremy Bentham, and the nearby St. Pancras International since it has been revamped - and what a smart train station it is.

The ISKO conference theme was 'content architecture', with a particular focus on:

"Integration and semantic interoperability between diverse resources – text, images, audio, multimedia
Social networking and user participation in knowledge structuring
Image retrieval
Information architecture, metadata and faceted frameworks"

The underlying themes throughout most papers were those related to the Semantic Web, Linked Data, and other Seman

tic Web inspired approaches to resolving or ameliorating common problems within our disciplines. There were a great many interesting papers delivered and it is difficult to say something about them all; however, for me, there were particular highlights (in no particular order)...

Libo Eric Si (et al.) from the Department of In for mation Science at Loughboro ugh University described research to develop a prototype middleware framework between disparate terminology resources to facilitate subject cross-browsing of information and library portal systems. A lot of work has already been undertaken in this area (see for example, HILT project (a project in which I used to be involved), and CrissCross), so it was interesting to hear about his 'bag' approach in which – rather than using precise mappings between different Knowledge Organisation Systems (KOS) (e.g. thesauri, subject heading lists, taxonomies, etc.) - "a number of relevant concepts could be put into a 'bag', and the bag is mapped to an equivalent DDC concept. The bag becomes a very abstract concept that may not have a clear meaning, but based on the evaluation findings, it was widely-agreed that using a bag to combine a number of concepts together is a good idea".

Brian Matthews (et al.) reported on an evaluation of social tagging and KOS. In par

ticular, they investigated ways of enhancing social tagging via KOS, with a view to improving the quality of tags for improvements in and retrieval performance. A detailed and robust methodology was provided, but essentially groups of participants were given the opportunity to tag resources using tags, controlled terms (i.e. from KOS), or terms displayed in a tag cloud, all within a specially designed demonstrator. Participants were later asked to try alternative tools in order to gather data on the nature of user preferences. There are numerous findings - and a pre-print of the paper is already available on the conference website so you can read these yourself - but the main ones can be summarised from their paper as follows and were surprising in some cases:

"Users appreciated the benefits of consistency and vocabulary control and were potentially willing to engage with the tagging system;
There was evidence of support for automated suggestions if they are appropriate and relevant;
The quality and appropriateness of the controlled vocabulary proved to be important;
The main tag cloud proved problematic to use effectively; and,
The user interface proved important along with the visual presentation and interaction sequence."

The user preference for controlled terms was reassuring. In fact, as Matthews et al. report:

"There was general sentiment amongst the depositors that choosing terms from a controlled vocabulary was a "Good Thing" and better than choosing their own terms. The subjects could overall see the value of adding terms for information retrieval purposes, and could see the advantages of consistency of retrieval if the terms used are from an authoritative source."

Chris Town from the University of Cambridge Computer Laboratory presented two (see [1], [2]) equally interesting papers relating to image retrieval on the Web. Although images and video now comprise the majority of Web content, the vast majority of retrieval systems essentially use text, tags, etc. that surround images in order t

o make assumptions about what the image might be. Of course, using any major search engine we discover that this approach is woefully inaccurate. Dr. Town has developed improved approaches to content-based image retrieval (CBIR) which provide a novel way of bridging the 'semantic gap' between the retrieval model used by the system and that of the user. His approach is founded on the "notion of an ontological query language, combined with a set of advanced automated image analysis and classification models". This approach has been so successful that he has founded his own company, Imense. The difference in performance between Imense and Google is staggering and has to been seen to be believed. Examples can be found in his presentation slides (which will be on the ISKO website soon), but can observed from simply messing around on the Imense Picture Search.

Chris Town's second paper essentially explored how best to do the CBIR image processing required for the retrieval system. According to Dr. Town there are approximately 20 billion images on the web, with the majority at a high resolution, meaning that by his calculation it would take 4000 years to undertake the necessary CBIR processing to facilitate retrieval! Phew! Large-scale grid computing options therefore have to be explored if the approach is to be scalable. Chris Town and his colleague Karl Harrison therefore undertook a series of CBIR processing evaluations by distributing the required computational task across thousands of Grid nodes. This distributed approach resulted in the processing of over 25 million high resolution images in less than two weeks, thus making grid processing a scalable option for CBIR.

Andreas Vlachidis (et al.) from the Hypermedia Research Unit at the University of Gla morgan described the use of 'information extraction' techniques employing Natural Language Processing (NLP) techniques to assist in the semantic indexing of archaeological text resources. Such 'Grey Literature' is a good tes

t bed as more established indexing techniques are insufficient in meeting user needs. The aim of the research is to create a system capable of being "semantically aware" during document indexing. Sounds complicated? Yes – a little. Vlachidis is achieving this by using a core cultural heritage ontology and the English Heritage Thesauri to support the 'information extraction' process and which supports "a semantic framework in which indexed terms are capable of supporting semantic-aware access to on-line resources".

Perhaps the most interesting aspect of the conference was that it was well attended by people outside the academic fraternity, and as such there were papers on how these organisations are doing innovative work with a range of technologies, specifications and standards which, to a large extent, remain the preserve of researchers and academics. Papers were delivered by technical teams at the World Bank and Dow Jones, for example. Perhaps the most interesting contribution from the 'real world' though was that delivered by Tom Scott, a key member of the BBC's online and technology team. Tom is a key proponent of the Semantic Web and Linked Data at the BBC and his presentation threw light on BBC activity in this area – and rather coincidentally complemented an accidental discovery I made a few weeks ago.

Tom currently leads the BBC Earth project which aims to bring more of the BBC's Natural History content online and bring the BBC into the Linked Data cloud, thus enabling intelligent linking, re-use, re-aggregation, with what's already available. He provided interesting examples of how the BBC was exposing structured data about all forms of BBC programming on the Web by adopting a Linked Data approach and he expressed a desire for users to traverse

detailed and well connected RDF graphs. Says Tom on his blog:

"To enable the sharing of this data in a structured way, we are using the linked data approach to connect and expose resources i.e. using web technologies (URLs and HTTP etc.) to identify and link to a representation of something, and that something can be person, a programme or an album release. These resources also have representations which can be machine-processable (through the use of RDF, Microformats, RDFa, etc.) and they can contain links for other web resources, allowing you to jump from one dataset to another."

Whilst Tom conceded that this work is small compared to the entire output and technical activity at the BBC, it still constitutes a huge volume of data and is significant owing to the BBC's pre-eminence in broadcasting. Tom even reported that a SPARQL end point will be made available to query this data. I had actually hoped to ask Tom a few questions during the lunch and coffee breaks, but he was such a popular guy that in the end I lost my chance, such is the existence of a popular techie from the Beeb.

Pre-print papers from the conference are available on the proceedings page of the ISKO-UK 2009 website; however, fully peer reviewed and 'added value' papers from the conference are to be published in a future issue of Aslib Proceedings.

Tuesday 16 June 2009

11 June 2009: the day Common Tags was born and collaborative tagging died?

Mirroring the emergence of other Web 2.0 concepts, 2004-2006 witnessed a great deal of hyperbole about collaborative tagging (or 'folksonomies' as they are sometimes known). It is now 2009 and most of us know what collaborative tagging is so I'll avoid contributing to the pile of definitions already available. The hype subsided after 2006 (how active is Tagsonomy now?), but the implementation of tagging within services of all types didn't; tagging became and is ubiquitous.

The strange thing about collaborative tagging is that when it emerged the purveyors of its hype (e.g. Clay Shirky in particular, but there were many others) drowned out the comments made by many in the information, computer and library sciences. The essence of these comments was that collaborative tagging broke so many of the well established rules of information retrieval that it would never really work in general resource discovery contexts. In fact, collaborative tagging was so flawed on a theoretical level that further exploration of its alleged benefits was considered futile. Indeed, to this day, research has been limited for this reason, and I recall attending a conference in Bangalore in which lengthy discussions ensued about tagging being ineffective and entirely unscalable. For the tagging evangelists though, these comments simply provided proof that these communities were 'stuck-in-their-way' and harboured an unwillingness to break with theoretical norms. One of the most irritating aspects of the position adopted by the evangelists was that they relied on the power of persuasion and were never able to point to evidence. Moreover, even their powers of persuasion were lacking because most of them were generally 'technology evangelists' with no real understanding of the theories of information retrieval or knowledge organisation; they were simply being carried along by the hype.

The difficulties surrounding collaborative tagging for general resource discovery are multifarious and have been summarised elsewhere; but one of the intractable problems relates to the lack of vocabulary control or collocation and the effect this has on retrieval recall and precision. The Common Tags website summarises the root problem in three sentences (we'll come back to Common Tags in a moment…):

"People use tags to organize, share and discover content on the Web. However, in the absence of a common tagging format, the benefits of tagging have been limited. Individual things like New York City are often represented by multiple tags (like 'nyc', 'new_york_city', and 'newyork'), making it difficult to organize related content; and it isn’t always clear what a particular tag represents—does the tag 'jaguar' represent the animal, the car company, or the operating system?"

These problems have been recognised since the beginning and were anticipated in the theoretical arguments posited by those in our communities of practice. Research has therefore focused on how searching or browsing tags can be made more reliable for users, either by structuring them, mapping them to existing knowledge structures, or using them in conjunction with other retrieval tools (e.g. supplementing tools based on automatic indexing). In short, tags in themselves are of limited use and the trend is now towards taming them using tried and tested methods. For advocates of Web 2.0 and the social ethos it often promotes, this is really a reversal of the tagging philosophy - but it appears to be necessary.

The root difficulty relates to use of collaborative tagging in Personal Information Management (PIM). Make no bones about it, tagging originally emerged as PIM tool and it is here that it has been most successful. I, for example, make good use of BibSonomy to organise my bookmarks and publications. BibSonomy might be like delicious on steroids, but one of its key features is the use of tags. In late 2005 I submitted a paper to the WWW2006 Collaborative Tagging Workshop with a colleague. Submitted at the height of tagging hyperbole, it was a theoretical paper exploring some of the difficulties with tagging as general resource discovery tool. In particular, we aimed to explore the difficulties in expecting a tool optimised for PIM to yield benefits when used for general resource discovery and we noted how 'PIM noise' was being introduced into users' results. How could tags that were created to organise a personal collection be expected to provide a reasonable level of recall, let alone precision? Unfortunately it wasn't accepted; but since it scored well in peer review I like to think that the organising committee were overwhelmed by submissions!! (It is also noteworthy that no other collaborative tagging workshops have been held since.)

Nevertheless, the basic thesis remains valid. It is precisely this tension (i.e. PIM vs. general resource discovery) which has compromised the effectiveness of collaborative tagging for anything other than PIM. Whilst patterns can be observed in collaborative tagging behaviour, we generally find that the problems summarised in the Common Tags quote above are insurmountable – and this simply because tags are used for PIM first and foremost, and often tell us nothing about the intellectual content of the resource ('toPrint' anyone? 'toRead', 'howto', etc.). True – users of tagging systems can occasionally discover similar items tagged by other users. But how useful is this and how often do you do it? And how often do you search tags? I never do any of these things because the results are generally feeble and I'm not particularly interested in what other people have been tagging. Is anyone? So whilst tags have taken off in PIM, their utility in facilitating wider forms of information retrieval has been quite limited.

Common Tags

Last Friday the Common Tags initiative was officially launched. Common Tags is a collaboration between some established Web companies and university research centres, including DERI at the National University of Ireland and Yahoo!. It is an attempt to address the multifarious problems above and to widen the use of tags. Says the Common Tags website:

"The Common Tag format was developed to address the current shortcomings of tagging and help everyone—including end users, publishers, and developers—get more out of Web content. With Common Tag, content is tagged with unique, well-defined concepts – everything about New York City is tagged with one concept for New York City and everything about jaguar the animal is tagged with one concept for jaguar the animal. Common Tag also provides access to useful metadata that defines each concept and describes how the concepts relate to one another. For example, metadata for the Barack Obama Common Tag indicates that he's the President of the United States and that he’s married to Michelle Obama."

Great! But how is Common Tags achieving this? Answer: RDFa. What else? Common Tags enables each tag to be defined using a concept URI taken from Freebase or DBPedia (much like more formal methods, e.g. SKOS/RDF) thus permitting the unique identification of concepts and ameliorating some of our resource discovery problems (see Common Tags workflow diagram below). A variety of participating social bookmarking websites will also enable users to bookmark using Common Tags (e.g. ZigTag, Faviki, etc.). In short, Common Tags attempts to Semantic Web-ify tags using RDFa/XHTML compliant web pages and in so doing makes tags more useful in general resource discovery contexts. Faviki even describes them as Semantic Tags and employs the logo strap line, 'tags that make sense'. Common Tags won't solve everything but at least to will see some improvement recall and increase the precision in certain circumstances, as well as offering the benefits of Semantic Web integration.

So, in summary, collaborative tagging hasn't died, but at least now - at long last - it might become useful for something other than PIM. There is irony in the fact that formal description methods have to be used to improve tag utility, but will the evangelists see it? Probably not.

Friday 12 June 2009

Serendipity reveals ontological description of BBC programmes

I have been enjoying Flight of the Conchords on BBC Four recently. Unfortunately, I missed the first couple of episodes of the new series. So that I could configure my Humax HDR to record all future episodes, I visited the BBC website to access their online schedule. It was while doing this that I discovered visible usage of the BBC's Programmes Ontology. The programme title (i.e. Flight of the Conchords) is hyperlinked to an RDF file on this schedule page.

The Semantic Web is supposed to provide machine readable data, not human readable data, and hyperlinking to an RDF/XML file is clearly a temporarily glitch at the Beeb. After all, 99.99% of BBC users clicking on these links would be hoping to see further details about the programme, not to be presented with a bunch of angled brackets. Nevertheless, this glitch provides an interesting insight for us since it reveals the extent to which RDF data is being exposed on the Semantic Web about BBC programming, and the vocabularies the BBC are using. Researchers at the BBC are active in dissemination (e.g. ESWC2009, XTech 2008), but it's not often that you surreptitiously discover this sort of stuff in action at an organisation like this.

The Programme Ontology is based significantly on the Music Ontology Specification and the FOAF Vocabulary Specification, but their data deploys – admittedly not in the example below, except in the namespace declarations – Dublin Core and SKOS.

Oh, and the next episode of Flight of the Conchords is on tonight at 23:00, BBC Four.

<?xml version="1.0" encoding="utf-8"?>
<rdf:RDF xmlns:rdf = "http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:rdfs = "http://www.w3.org/2000/01/rdf-schema#"
xmlns:foaf = "http://xmlns.com/foaf/0.1/"
xmlns:po = "http://purl.org/ontology/po/"
xmlns:mo = "http://purl.org/ontology/mo/"
xmlns:skos = "http://www.w3.org/2008/05/skos#"
xmlns:time = "http://www.w3.org/2006/time#"
xmlns:dc = "http://purl.org/dc/elements/1.1/"
xmlns:dcterms = "http://purl.org/dc/terms/"
xmlns:wgs84_pos= "http://www.w3.org/2003/01/geo/wgs84_pos#"
xmlns:timeline = "http://purl.org/NET/c4dm/timeline.owl#"
xmlns:event = "http://purl.org/NET/c4dm/event.owl#">

<rdf:Description rdf:about="/programmes/b00l22n4.rdf">
<rdfs:label>Description of the episode Unnatural Love</rdfs:label>
<dcterms:created rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2009-06-02T00:14:09+01:00</dcterms:created>
<dcterms:modified rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2009-06-02T00:14:09+01:00</dcterms:modified>
<foaf:primaryTopic rdf:resource="/programmes/b00l22n4#programme"/>
</rdf:Description>

<po:Episode rdf:about="/programmes/b00l22n4#programme">
<dc:title>Unnatural Love</dc:title>
<po:short_synopsis>Jemaine accidentally goes home with an Australian girl he meets at a nightclub.</po:short_synopsis>
<po:medium_synopsis>Comedy series about two Kiwi folk musicians in New York. When Bret and Jemaine go out nightclubbing with Dave, Jemaine accidentally goes home with an Australian girl.</po:medium_synopsis>
<po:long_synopsis>When Bret and Jemaine go out nightclubbing with Dave, Jemaine accidentally goes home with an Australian girl. At first plagued by shame and self-doubt, he comes to care about her, much to Bret and Murray's annoyance. Can their love cross the racial divide?</po:long_synopsis>
<po:masterbrand rdf:resource="/bbcfour#service"/>
<po:position rdf:datatype="http://www.w3.org/2001/XMLSchema#int">5</po:position>
<po:genre rdf:resource="/programmes/genres/comedy/music#genre" />
<po:genre rdf:resource="/programmes/genres/comedy/sitcoms#genre" />
<po:version rdf:resource="/programmes/b00l22my#programme" />
</po:Episode>

<po:Series rdf:about="/programmes/b00kkptn#programme">
<po:episode rdf:resource="/programmes/b00l22n4#programme"/>
</po:Series>

<po:Brand rdf:about="/programmes/b00kkpq8#programme">
<po:episode rdf:resource="/programmes/b00l22n4#programme"/>
</po:Brand>
</rdf:RDF>

Quasi-facetted retrieval of images using emotions?

As part of my literature catch up I found an extremely interesting paper in JASIST by S. Schmidt and Wolfgang G. Stock entitled, 'Collective indexing of emotions in images : a study in emotional information retrieval'. The motivation behind the research is simple: images tend to elicit emotional responses in people. Is it therefore possible to capture these emotional responses and use them in image retrieval?

An interesting research question indeed, and Schmidt and Stock's study found that 'yes', it is possible to capture these emotional responses and use them. In brief, their research asked circa 800 users to tag a variety of public images from Flickr using their scroll-bar tagging system. This scroll-bar tagging system allowed users to tag images according to a series of specially selected emotional responses and to indicate the intensity of these emotions. Schmidt and Stock found that users tended to have favourite emotions and this can obviously differ between users; however, for a large proportion of images the consistency of emotion tagging is very high (i.e. a large proportion of users frequently experience the same emotional response to an image). It's a complex area of study and their paper is recommended reading precisely for this reason (capturing emotions anyone?!), but their conclusions suggest that:

"…it seems possible to apply collective image emotion tagging to image information systems and to present a new search option for basic emotions."

To what extent does the image above (by D Sharon Pruitt) make you feel happiness, anger, sadness, disgust or fear? It is early days, but the future application of such tools could find a place within the growing suite of image filters that many search engines have recently unveiled. For example, yesterday Keith Trickey was commenting on the fact that the image filters in Bing are better than Google or Yahoo!. True. There are more filters, and they seem to work better. In fact, they provide a species of quasi-taxonomical facets: (by) size, layout, color, style and people. It's hardly Ranganathan's PMEST, but – keeping in mind that no human intervention is required - it's a useful quasi-facet way of retrieving or filtering images, albeit flat.

An emotional facet, based on Schmidt and Stock's research, could easily be added to systems like Bing. In the medium term it is Yahoo! that will be more in a position to harness the potential of emotional tagging. They own Flickr and have recently incorporated the searching and filtering of Flickr images within Yahoo! Image Search. As Yahoo! are keen for us to use Image Search to find CC images for PowerPoint presentations, or to illustrate a blog, being able to filter by emotions would be a useful addition to the filtering arsenal.

Thursday 11 June 2009

Bada Bing!

So much has been happening in the world of search engines since spring this year. This much can be evidenced from the postings on this blog. All the (best) search engines have been active in improving user tools, features, extra search functionality, etc. and there is a real sense that some serious competition is happening at the moment. It's all exciting stuff…

Last week Microsoft officially released its new Bing search engine. I've been using it, and it has found things Google hasn't been able to. The critics have been extremely impressed by Bing too and some figures suggest that it is stealing market share and moving Yahoo! to the number 2 spot. What about number 1?

The trouble is that it doesn't matter how good your search engine is because it will always have difficulty interrupting users' habitual use of Google. Indeed, Google's own research has demonstrated that the mere presence of the Google logo atop a result set is a key determinant of whether a user is satisfied with their results or not. In effect, users can be shown results from Yahoo! but branded as Google, and vice versa, but will always choose the result with the Google branding. Thus, users are generally unable to tell whether there is any real difference in the results (i.e. their precision, relevance, etc.) and are actually more influenced by the brand and their past experience. It's depressing, but a reality for the likes of Microsoft, Yahoo!, Ask, etc.

Francis Muir has the 'Microsoft mantra'. He predicts that in the long run Microsoft is always going to dominate Google – and I am starting to agree with him. Microsoft sit back, wait for things to unfold, and then develop something better than its previously dominant competitors. True – they were caught on the back foot with Web searching, but Bing is as at least as good as Yahoo!, perhaps better, and it can only get better. Their contribution to cloud computing (SkyDrive) offers 25GB storage, integration with Office and email, etc. and is far better than anything else available. Google documents? Pah! Who are you going to share that with? And then you consider Microsoft's dominance in software, operating systems, programming frameworks, databases, etc. Integrating and interoperating with this stuff over the Web is a significant part of the Web's future. Google is unlikely to be part of this, and for once I'm pleased.

It is not Microsoft's intention to take on Google's dominance of the Web at the moment. But I reckon Bing is certainly part of the long term strategy. The Muir prophecy is one step closer methinks.

Cracking open metadata and cataloguing research with Resource Description & Access (RDA)

I have been taking the opportunity to catch up with some recently published literature over the past couple of weeks. While perusing the latest issue of the Bulletin of the American Society for Information Science and Technology (the magazine which complements JASIST), I read an interesting article by Shawne D. Miksa (associate professor at the College of Information, University of North Texas). Miksa's principal research interests reside in metadata, cataloguing and indexing. She has been active in disseminating about Resource Description & Access (RDA) and has a book in the pipeline designed to demystify it.

RDA has been in development for several years now, is the successor to AACR2 and provides rules and guidance on the cataloguing of information entities. I use the phrase 'information entities' since RDA departs significantly from AACR2. The foundations of AACR2 were created prior to the advent of the Web and this remains problematic given the digital and new media information environment in which we now exist. Of course, more recent editions of AACR2 have attempted to better accommodate these developments, but fire fighting was always order of the day. The now re-named Joint Steering Committee for the Development of RDA has known for quite some time that an entirely new approach was required – and a few years ago radical changes to AACR2 were announced. As my ex-colleague Gordon Dunsire describes in a recen t D-Lib Magazine article:

"RDA: Resource Description and Access is in development as a new standard for resource description and access designed for the digital world. It is being built on the foundation established for the Anglo-American Cataloguing Rules (AACR). Although it is being developed for use primarily in libraries, it aims to attain an effective level of alignment with the metadata standards used in related communities such as archives, museums and publishers, and to provide a better fit with emerging database technologies."

The ins and outs of RDA is a bit much for this blog; suffice to say that RDA is ultimately designed to improve the resource discovery potential of digital libraries and other retrieval systems by utilising the FRBR conceptual entity-relationship model (see this entity-relationship diagram at the FRBR blog). FRBR provides a holistic approach to users' retrieval requirements by establishing the relationships between information entities and allowing users to traverse the hierarchical relationships therein. I am an advocate of FRBR and appreciate its retrieval potential. Indeed, I often direct postgraduate students to Fiction Finder, an OCLC Research prototype which demonstrates the FRBR Work-Set Algorithm.

Reading Miksa's article was interesting for two reasons. Firstly, RDA has fallen off of my radar recently. I used to be kept abreast of RDA development through the activities of my colleague Gordon, who also disseminates widely on RDA and feeds into the JSC's work. Miksa's article – which announces the official release of RDA in second half of 2009 – was almost like being in a time machine! RDA is here already! Wow! It only seems like last week when JSC started work on RDA (...but it was actually over 5 years ago…).

The development of RDA has been extremely controversial, and Miksa alludes to this in her article – metadata gurus clashing with traditional cataloguers clashing with LIS revolutionaries. It has been pretty ugly at times. But secondly – and perhaps more importantly – Miksa's article is a brilliant call to arms for more metadata research. Not only that, she notes areas where extensive research will be mandatory to bring truly FRBR-ised digital libraries to fruition. This includes consideration of how this impacts upon LIS education.

A new dawn? I think so… Can the non-believers grumble about that? Between the type of developments noted earlier and RDA, the future of information organisation is alive and kicking.

Thursday 4 June 2009

Fight! Google Squared vs. WolframAlpha

By now we all realise that WolframAlpha is not intended to compete with Google's Universal Search; it's a 'computational knowledge engine' designed to serve up facts, data and scientific knowledge and is an entirely different beast. Nevertheless, Google is not a company to be outdone and has just announced the release of Google Squared which, if the technology press is to be believed, is Google's attempt to usurp WolframAlpha's grip on offering up facts, data and knowledge. Indeed, Google attempted to steal WolframAlpha's thunder by announcing that Google Squared was in development on the same day Stephen Wolfram was unveiling WolframAlpha for the first time a few weeks ago. Meow!

In the same way that WolframAlpha occupies a different intellectual space to most web search engines, Google Squared seems to be quite different to WolframAlpha. Says the Official Google Blog:

"Google Squared is an experimental search tool that collects facts from the web and presents them in an organized collection, similar to a spreadsheet. If you search for [roller coasters], Google Squared builds a square with rows for each of several specific roller coasters and columns for corresponding facts, such as image, height and maximum speed."

Google Squared appears to work best when the query submitted is conducive to comparing species of, say, snakes or country rock bands. With the former you retrieve a variety of snake types, images, description, as well as biological taxonomic classification data, and with the latter genre and date of band formation is retrieved (including Dillard & Clark and the Flying Burrito Brothers), in addition to images and descriptions. Many of the data values are incorrect, but Google has been quite forthright in stating that Google Squared to extremely experimental ("This technology is by no means perfect"; "Google Squared is an experimental search tool"). Of course, Google wants us to explore their canned searches, such as Rollercoasters or African countries, to best appreciate what is possible.

As we noted recently though, place names are good to test these systems and, like WolframAlpha, some bizarre results are retrieved. A search for Liverpool seems to only retrieve facts on assorted Liverpool F.C. players, and Glasgow retrieves persons associated with Glasgow and the death of Glasgow Central train station in 1989(!) I had hoped Google Squared's comparative power might have pulled together facts and statistics on Glasgow (UK) with the ten or so places named Glasgow in the USA and Canada. A similar result would have been expected for Liverpool or Manchester (which has far, far more), but alas. This is a particular shame owing to the fact that much of this data is available on Wikipedia in a relatively structured format, with disambiguation pages to help.

Google Squared allows users to correct data, remove erroneous results or suggest better results. The effect of this is a dynamically evolving result set. A search for a popular topic an hour ago can yield an entirely different result an hour later. All of this will help Google Squared become more accurate and cleverer over time.

Although Google Squared and WolframAlpha are quite different, there are some similarities. For this reason it is possible to state that the current score is 1-0 to WolframAlpha.

Information Strategy Group, LJMU