Tuesday, 7 October 2008

Search engines: solving the 'Anomalous State of Knowledge'

Information retrieval (IR) remains one of the most active areas of research within the information, computing and library science communities. It also remains one of the sexiest. The growth in information retrieval sex appeal has a clear correlation with the growth of the Web and the need for improvements in retrieval systems based on automatic indexing. No doubt the flurry of big name academics and Silicon Valley employees attending conferences such as SIGIR also adds glamour. Nevertheless, the allure of IR research has precipitated some of the best innovations in IR ever, as well as creating some of the most important search engines and business brands. Of course, asked to pick from a list their favourite search engine or brand, most would probably select Google.

The habitual use of Google by students (and by real people generally!) was discussed in a previous post and needn't be revisited here. Nevertheless, one of the most distressing aspects of Google (for me, at least!) is a recent malaise in its commitment to search. There have been some impressive innovations in a variety of search engines in a variety of areas. For example, Yahoo! is to better harness metadata and Semantic Web data on the Web. More interestingly though, some recent and impressive innovations in solving the 'ASK conundrum' is visible in a variety of search engines, but not in Google. Although Google always tell us that search is its bread and butter, is it spreading itself a little too thinly? Or - with a brand loyalty second to none and the robust PageRank algorithm deployed to good effect – is Google resting on its laurels?

In 1982 a young Nicholas J. Belkin spearheaded a series of seminal papers documenting various models of users' information needs in IR. These papers remain relevant today and are frequently cited. One of Belkin et al.'s central suppositions is that the user suffers from the so-called Anomalous State of Knowledge, which can be conveniently acronymized to 'ASK'. Their supposition can be summarised by the following quote from their JDoc paper:
"[P]eople who use IR systems do so because they have recognised an anomaly in their state of knowledge on some topic, but they are unable to specify precisely what is necessary to resolve that anomaly. ... Thus, we presume that it is unrealistic (in general) to ask the user of an IR system to say exactly what it is that she/he needs to know, since it is just the lack of that knowledge which has brought her/him to the system in the first place".
This astute deduction ushered in a branch of IR research that sought to improve retrieval by resolving the Anomalous State of Knowledge (e.g. providing the user with assistance in the query formulation process, helping users ‘fill in the blanks’ to improve recall (e.g. query expansion), etc.).

Last winter Yahoo! unveiled its 'Search Assist' facility (see screenshot above - search for 'united nations'), which provides a real time query formulation assistance to the user. Providing these facilities in systems based on metadata has always been possible owing the use of controlled vocabularies for indexing, the use of name authority files, and even content standards such as AACR2; but providing a similar level functionality with unstructured information is difficult – yet Yahoo! provide something ... and it can be useful and can actually help resolve the ASK conundrum!


Similarly, meta-search engine Clusty has provided its 'clustering' techniques for quite some time. These clusters group related concepts and are designed to aid in query formulation, but also to provide some level of relevance feedback to users (see screenshot above - search for 'George Macgregor'). Of course, these clusters can be a bit hit or miss but, again, they can improve retrieval and aid the user in query formulation. Similar developments can also be found in Ask. View this canned search, for example. What help does Google provide?

The bottom line is that some search engines are innovating endlessly and putting the fruits of a sexy research area to good use. These search engines are actually moving search forward. Can the same still be said of Google?