Tuesday, 2 November 2010

Crowd-sourcing faceted information retrieval

This blog has witnessed the demise of several search engines, all of which have attempted to challenge the supremacy of the big innovators - and I would tend to include Yahoo! and Bing before the obvious market leader. Yesterday it was the turn of Blekko to be the next Cuil. Or is it?

Blekko presents a fresh attempt to move web search forward, using a style of retrieval which has hitherto only been successful in systems based on pre-coordinated indexes and combining it with crowd-sourcing techniques. Interestingly, Rich Skrenta - co-founder of Blekko - was also a principal founder of the Dmoz project. Remember Dmoz? When I worked on BUBL years and years ago, I recall considering Dmoz to be an inferior beast. But it remains alive and kicking – and remains popular and relevant to modern web developments with weekly RDF dumps made of its rich, categorised, crowd-sourced content for Linked Data purposes. BUBL, on the other hand, has been static for years.

Flirting with taxonomical organisation and categorisation with Dmoz (as well as crowd-sourcing) has obviously influenced the Blekko approach to search. Blekko provides innovation in retrieval by enabling users to define their very own vertical search indexes using so-called 'slashtags', thus (essentially) providing a quasi form of faceted search. The advantage of this approach is that using a particular slashtag (or facet, if you prefer) in a query increases precision by removing 'irrelevant' results associated with different meanings of the search query terms. Sounds good, eh? Ranganathan would be salivating at such functionality in automatic indexing! To provide some form of critical mass, Blekko has provided hundreds of slashtags that can be used straight away; but the future of slashtags depends on users creating their own, which will be screened by Blekko before being added to their publicly available slashtags list. Blekko users can also assist in weeding out poor results and any erroneous slashtags results (see the video below) thus contributing to the improved precision Blekko purports to have and maintaining slashtag efficacy. In fact, Skrenta proposes that the Blekko approach will improve precision in the longer term. Says Skrenta on the BBC dot.Maggie blog:
"The only way to fix this [precision problem] is to bring back large-scale human curation to search combined with strong algorithms. You have to put people into the mix […] Crowdsourcing is the only way we will be able to allow search to scale to the ever-growing web".
Let's look at a typical Blekko query. I am interested in the new Microsoft Windows mobile OS, and in bona fide reviews of the new OS. Moreover, since I am tech savvy and will have read many reviews, I am only interested in reviews published recently (i.e. within the past two weeks, or so). In Blekko we can search like so…

"windows mobile 7" /tech-reviews /date

…where the /tech-reviews slashtag limits results to genuine reviews published in the technology press and/or associated websites, and the /date slashtag orders the results by date. It works, and works spectacularly well. Skrenta sticks two fingers up at his competitors when in the Blekko promotional video he quips, "Try doing this [type of] search anywhere else!" Blekko provides 'Five use cases where slashtags shine' which - although only using one slashtag - illustrate how the approach can be used in a variety of different queries. Of course, Blekko can still be used like a conventional search engine, e.g. enter a query and get results ranked according to the Blekko algorithm. And on this count – using my own personal 'search engine test queries' - Blekko appears to rank relevant results sensibly and index pages which other search engines either ignore or, if they do index them, normally drown in spam (spam results which these engines rank as more relevant).

There is a lot to admire about Blekko. Aside from an innovative approach to information retrieval, there is also a commitment to algorithm openness and transparency which SEO people will be pleased about; but I worry that while a Blekko slashtag search is innovative and useful, most users will approach Blekko as another search engine rather than buying into the importance of slashtags and, in doing so will not hang around long enough to 'get it' (even though I intend to...). Indeed, to some extent Blekko has more in common with command line searching of the online databases in the days of yore. There are also some teething troubles which rigorous testing can reveal. But there are reasons to be hopeful. Blekko is presumably hoping to promote slashtag popularity and have users following slashtags just as users follow Twitter groups, thus driving website traffic and presumably advertising. Being the owner of that slashtag could be useful, but also highly profitable, even if Blekko remains small.

blekko: how to slash the web from blekko on Vimeo.

