Friday 5 December 2008

Some general musings on tag clouds, resource discovery and pointless widgets...

The efficacy of collaborative tagging in information retrieval and resource discovery has undergone some discussion on this blog in the past. Despite emerging a good couple of years ago – and like many other Web 2.0 developments – collaborative tagging remains a topic of uncertainty; an area lacking sufficient evaluation and research. A creature of collaborative tagging which has similarly evaded adequate evaluation is the (seemingly ubiquitous!) 'tag cloud'. Invented by Flickr (Flickr tag cloud) and popularised by delicious (and aren't you glad they dropped the irritating full stops in their name and URL a few months ago?), tag clouds are everywhere; cluttering interfaces with their differently and irritatingly sized fonts.

Coincidentally, a series of tag cloud themed research papers were discussed at one of our recent ISG research group meetings. One of the papers under discussion (Sinclair & Cardew-Hall, 2008) conducted an experimental study comparing the usage and effectiveness of tag clouds with traditional search interface approaches to information retrieval. Their work is welcomed since it constitutes one of the few robust evaluations of tag clouds since they emerged several years ago.

One would hate to consider tag clouds as completely useless – and I have to admit to harbouring this thought. Fortunately, Sinclair and Cardew-Hall found tag clouds to be not entirely without merit. Whilst they are not conducive to precision retrieval and often conceal relevant resources, the authors found that users reported them useful for broad browsing and/or non-specific resource discovery. They were also found to be useful in characterising the subject nature of databases to be searched, thus aiding the information seeking process. The utility of tag clouds therefore remains confined to the search behaviours of inexperienced searchers and – as the authors conclude - cannot displace traditional search facilities or taxonomic browsing structures. As always, further research is required...

The only thing saving tag clouds from being completely useless is that they can occasionally assist you in finding something useful, perhaps serendipitously. What would be the point in having a tag cloud that didn't help you retrieve any information at all? Answer: There wouldn't be any point; but this doesn't stop some people. Recently we have witnessed the emergence of 'tag cloud generation' tools. Such tools generate tag clouds for Web pages, or text entered by the user. Wordle is one such example. They look nice and create interesting visualisations, but don't seem to do anything other than take a paragraph of text and increase the size of words based on frequency. (See the screen shot of a Wordle tag cloud for my home page research interests.)


OCLC have developed their very own tag cloud generator. Clearly, this widget has been created while developing their suite of nifty services, such as WorldCat, DeweyBrowser, FictionFinder, etc., so we must hold fire on the criticism. But unlike Wordle, this is something OCLC could make useful. For example, if I generate a tag cloud via this service, I expect to be able to click on a tag and immediately initiate a search on WorldCat, or a variety of OCLC services … or the Web generally! In line with good information retrieval practice, I also expect stopwords to be removed. In my example some of the largest tags are nonsense, such as "etc", "specifically", "use", etc. But I guess this is also a fundamental problem with tagging generally...

OCLC are also in a unique position in that they have access to numerous terminologies. This obviously cracks open the potential for cross-referencing tags with their terminological datasets so that only genuine controlled subject terms feature in the tag cloud, or productive linkages can be established between tags and controlled terms. This idea is almost as old as tagging itself but, again, has taken until recently to be investigated properly. Exploring the connections between tags and controlled vocabularies is something the EnTag project is exploring, a partner in which is OCLC. In particular, EnTag (Enhanced Tagging for Discovery) is exploring whether tag data, normally typified by its unstructured and uncontrolled nature, can be enhanced and rendered more useful by robust terminological data. The project finished a few months ago – and a final report is eagerly anticipated, particularly as my formative research group submitted a proposal to JISC but lost out to EnTag! C'est la vie!

No comments:

Post a Comment