The Information Strategy Group at Liverpool Business School, Liverpool John Moores University, offers courses and undertakes research in areas pertaining to information management, business information systems, communications and public relations, and library and information science.
My interest in information retrieval means that subscribing to search engine blogs (among other things) is essential. The most active blog to which I subscribe is the Official Google Blog. According to Google, the OGB provides "insights from Googlers into our products, technology, and the Google culture". More simply, the OGB is the place to look for developments in search, particularly those which Google wants to shout about.
There was a time (probably around two years ago) when updates to the OGB occurred every other week, and often the receipt of the RSS feed would compel me to post to this blog, such were the gravity of OGB announcements (see this, this and this, for example). However, in the past six months the OGB has been in overdrive. Almost every day a huge Google announcement is made on the OGB, whether it's the announcement of Google Instant or significant developments to Google Docs. Enter Google New, a new dedicated website to find all things new from Google. Here's the rationale from Google as published – yup, you guessed it – on the OGB:
"If it seems to you like every day Google releases a new product or feature, well, it seems like that to us too. The central place we tell you about most of these is through the official Google Blog Network [...] But if you want to keep up just with what’s new (or even just what Google does besides search), you’ll want to know about Google New. A few of us had a 20 percent project idea: create a single destination called Google New where people could find the latest product and feature launches from Google. It’s designed to pull in just those posts from various blogs."
The Official Google blog has just announced an HTML5 Chrome Experiment in association with Canadian indie rock band, Arcade Fire. This experiment appears to function as a marketing exercise for both Chrome and Arcade Fire; although it does also demonstrate that Google has a commitment to HTML5 (and it appears to be part of a wider partnership with Arcade Fire, as the video below indicates).
HTML5 is still currently under development but is the next major revision of the HTML standard (as distinct from the recent incorporation of RDF, i.e. XHTML+RDFa). HTML5 will still be optimised for structuring and presenting content on the Web; however, it includes numerous new elements to better incorporate multimedia (which is currently heavily dependent on third party plug-ins), drag and drop functionality, improved support for semantic microdata, among many, many other things...
The Chrome Experiment entitled, 'The Wilderness Downtown', uses a variety of HTML5 building blocks. In their words:
"Choreographed windows, interactive flocking, custom rendered maps, real-time compositing, procedural drawing, 3D canvas rendering... this Chrome Experiment has them all. "The Wilderness Downtown" is an interactive interpretation of Arcade Fire's song "We Used To Wait" and was built entirely with the latest open web technologies, including HTML5 video, audio, and canvas."
Being an 'experiment' it can be a little over the top, and I suppose it isn't an accurate reflection of how HTML5 will be used in practice. Nevertheless, it is certainly worth checking out - and I was quite impressed with canvas. An HTML5 compliant browser is required, as well as some time (it took 7 minutes to load!!!).
Google has been flirting with the Semantic Web recently, and we've talked about it occasionally on this blog. However, compared with other web search engines (e.g. Yahoo!) and the state of Semantic Web activity generally, Google has been slow to dive in completely. They have restricted themselves to rich snippets, using bits of RDFa and microformats, and making up their own too. Perhaps this was because their intention was always to purchase a prominent Semantic Web start-up company instead of putting in the spade work themselves? Perhaps so.
My comments are limited to the above; just thought this was probably an extremely important development and one to watch. A high level of social proof appears to be required before some tech firms or organisations will embrace the Semantic Web. But what greater social proof than Google? Google also appear committed to the Freebase ethos:
"[We] plan to maintain Freebase as a free and open database for the world. Better yet, we plan to contribute to and further develop Freebase and would be delighted if other web companies use and contribute to the data. We believe that by improving Freebase, it will be a tremendous resource to make the web richer for everyone. And to the extent the web becomes a better place, this is good for webmasters and good for users."
The area of educational software is not completely alien to Google. The Google Apps Education Edition (providing email, collaboration widgets, etc.) has been around for a while now (I think) and - as the article insinuates - moving deeper into educational software seems a natural progression and provides Google with clear access to a key demographic. This is all conjecture of course; but if Google acquired Blackboard I think I would suffer a schizophrenic episode. A part of me would think, "Great - Google will make Blackboard less clunky, offer more functionality and more flexiblity". But the other part (which is slightly bigger, I think) would feel extremely uncomfortable that Google is yet again moving into new areas, probably with the intention of dominating that area.
We forget how huge and pervasive Google is today. Google is everywhere and now reaches far beyond its dominant position in search into virtually every significant area of web and software development. If Google were Microsoft the US Government and the EU would be all over Google like a rash for pushing the boundaries of antitrust legislation and competition laws. This situation takes on a rather sinister tone when you consider the situation in HE if Blackboard becomes a Google subsidiary. Edge Hill University is one of several institutions which has elected to ditch fully integrated institutional email applications (e.g. MS Outlook, Thunderbird) in favour of Google Mail. Having a VLE maintained by Google therefore sets the alarm bells ringing. The key technological interactions for a 21st century student are as follows: email, web, VLE, library. Picture it - a student existence which would be entirely dependent upon one company and the directed advertising that goes with it: Google Mail, web (and their first port of call is likely to be Google, of course), GoogleBoard (the name of Blackboard if they decided to re-brand it!) and a massive digital library which Google is attempting to create and which would essentially create a de facto digital library monopoly.
I'm probably getting ahead of myself. The acquisition of Blackboard probably won't happen, and the digital library has encountered plenty of opposition, not least from Angela Merkel; but it does get me thinking that Google finally needs reining in. Even before this news broke I was starting to think that Google was turning into a Sesame Street-style Cookie Monster, devouring everything in sight. Their ubiquity can't possibly be healthy anymore, can it? Or am I being completely paranoid?
Google unveiled Wave at their Google I/O conference in late May 2009. The Wave development team presented a lengthy demonstration of what it can do and – given that it was probably a well rehearsed presentation and demo – Wave looked pretty impressive. It might be a little bit boring of me, but I was particularly impressed by the context sensitive spell checker ("Icland is an icland" – amazing!). Those of you that missed that demonstration can check it out in the video below. And try not to get annoyed at the sycophantic applause of their fellow Google developers...
Since then Wave has been hyped up by the technology press and even made mainstream news headlines at the BBC, Channel 4 News, etc. when it went on limited (invitation only) release last week. Dot.life has reviewed Wave and the verdict was not particularly positive. Surprisingly they (Rory Cellan-Jones, Stephen Fry, Bill Thompson and others) found it pretty difficult to use and pretty chaotic. I'm now anxious to try it out myself because I was convinced that it would be pretty amazing. Their review is funny and worth reading in full; but the main issues were noted as follows:
"Well, I'm not entirely sure that our attempt to use Google Wave to review Google Wave has been a stunning success. But I've learned a few lessons.
First of all, if you're using it to work together on a single document, then a strong leader (backed by a decent sub-editor, adds Fildes) has to take charge of the Wave, otherwise chaos ensues. And that's me - so like it or lump it, fellow Wavers.
Second, we saw a lot of bugs that still need fixing, and no very clear guide as to how to do so. For instance, there is an "upload files" option which will be vital for people wanting to work on a presentation or similar large document, but the button is greyed out and doesn't seem to work.
Third, if Wave is really going to revolutionise the way we communicate, it's going to have to be integrated with other tools like e-mail and social networks. I'd like to tell my fellow Wavers that we are nearly done and ready to roll with this review - but they're not online in Wave right now, so they can't hear me.
And finally, if such a determined - and organised - clutch of geeks and hacks struggle to turn their ripples and wavelets into one impressive giant roller, this revolution is going to struggle to capture the imagination of the masses."
My biggest concern about Wave was the important matter of critical mass, and this is something the dot.life review hints at too. A tool like Wave is only ever going to take off if large numbers of people buy into it; if your organisation suddenly dumps all existing communication and collaboration tools in favour of Wave. It's difficult to see that happening any time soon...
The advent of Web 2.0 has brought about a huge increase in interactive websites and dynamic page content, much of which has been delivered using AJAX ('Asynchronous JavaScript and XML', not a popular household cleaner!). AJAX is great and furnished me with my iGoogle page years ago; but increasingly websites use it to deliver page content which might otherwise be delivered using static web pages in XHTML. This presents a big problem for search engines because AJAX is currently un-indexable (if this is a word!) and a lot of content is therefore invisible to all search engines. Indeed, the latest web design mantra has been "don't publish in AJAX if you want your website to be visible". (There are also accessibility and usability issues, but these are an aside for this posting...)
The Webmaster Blog summarises:
"While AJAX-based websites are popular with users, search engines traditionally are not able to access any of the content on them. The last time we checked, almost 70% of the websites we know about use JavaScript in some form or another. Of course, most of that JavaScript is not AJAX, but the better that search engines could crawl and index AJAX, the more that developers could add richer features to their websites and still show up in search engines."
Google's proposal involves shifting the responsibility of indexing the website to the administrator/webmaster of the website, whose responsibility it would be to set up a headless browser on the web server. (A headless browser is essentially a browser without a user interface; a piece of software that can access web documents but does not deliver them to human users). The headless browser would then be used to programmatically access the AJAX website on the server and provide an HTML 'snap shot' to search engines when they request it - which is a clever idea. The crux of Google's proposal is a suite of URL protocols. These would control when the search engine knows to request the headless browser information (i.e. HTML snapshot) and which URL to reveal to human users.
It's good that Google are taking the initiative; my only concern is that they start trying to re-write standards, as they have a little with RDFa. Their slides are below - enjoy!
So much has been happening in the world of search engines since spring this year. This much can be evidenced from the postings on this blog. All the (best) search engines have been active in improving user tools, features, extra search functionality, etc. and there is a real sense that some serious competition is happening at the moment. It's all exciting stuff…
Last week Microsoft officially released its new Bing search engine. I've been using it, and it has found things Google hasn't been able to. The critics have been extremely impressed by Bing too and some figures suggest that it is stealing market share and moving Yahoo! to the number 2 spot. What about number 1?
The trouble is that it doesn't matter how good your search engine is because it will always have difficulty interrupting users' habitual use of Google. Indeed, Google's own research has demonstrated that the mere presence of the Google logo atop a result set is a key determinant of whether a user is satisfied with their results or not. In effect, users can be shown results from Yahoo! but branded as Google, and vice versa, but will always choose the result with the Google branding. Thus, users are generally unable to tell whether there is any real difference in the results (i.e. their precision, relevance, etc.) and are actually more influenced by the brand and their past experience. It's depressing, but a reality for the likes of Microsoft, Yahoo!, Ask, etc.
Francis Muir has the 'Microsoft mantra'. He predicts that in the long run Microsoft is always going to dominate Google – and I am starting to agree with him. Microsoft sit back, wait for things to unfold, and then develop something better than its previously dominant competitors. True – they were caught on the back foot with Web searching, but Bing is as at least as good as Yahoo!, perhaps better, and it can only get better. Their contribution to cloud computing (SkyDrive) offers 25GB storage, integration with Office and email, etc. and is far better than anything else available. Google documents? Pah! Who are you going to share that with? And then you consider Microsoft's dominance in software, operating systems, programming frameworks, databases, etc. Integrating and interoperating with this stuff over the Web is a significant part of the Web's future. Google is unlikely to be part of this, and for once I'm pleased.
It is not Microsoft's intention to take on Google's dominance of the Web at the moment. But I reckon Bing is certainly part of the long term strategy. The Muir prophecy is one step closer methinks.
By now we all realise that WolframAlpha is not intended to compete with Google's Universal Search; it's a 'computational knowledge engine' designed to serve up facts, data and scientific knowledge and is an entirely different beast. Nevertheless, Google is not a company to be outdone and has just announced the release of Google Squared which, if the technology press is to be believed, is Google's attempt to usurp WolframAlpha's grip on offering up facts, data and knowledge. Indeed, Google attempted to steal WolframAlpha's thunder by announcing that Google Squared was in development on the same day Stephen Wolfram was unveiling WolframAlpha for the first time a few weeks ago. Meow!
"Google Squared is an experimental search tool that collects facts from the web and presents them in an organized collection, similar to a spreadsheet. If you search for [roller coasters], Google Squared builds a square with rows for each of several specific roller coasters and columns for corresponding facts, such as image, height and maximum speed."
Google Squared appears to work best when the query submitted is conducive to comparing species of, say, snakes or country rock bands. With the former you retrieve a variety of snake types, images, description, as well as biological taxonomic classification data, and with the latter genre and date of band formation is retrieved (including Dillard & Clark and the Flying Burrito Brothers), in addition to images and descriptions. Many of the data values are incorrect, but Google has been quite forthright in stating that Google Squared to extremely experimental ("This technology is by no means perfect"; "Google Squared is an experimental search tool"). Of course, Google wants us to explore their canned searches, such as Rollercoasters or African countries, to best appreciate what is possible.
As we noted recently though, place names are good to test these systems and, like WolframAlpha, some bizarre results are retrieved. A search for Liverpool seems to only retrieve facts on assorted Liverpool F.C. players, and Glasgow retrieves persons associated with Glasgow and the death of Glasgow Central train station in 1989(!) I had hoped Google Squared's comparative power might have pulled together facts and statistics on Glasgow (UK) with the ten or so places named Glasgow in the USA and Canada. A similar result would have been expected for Liverpool or Manchester (which has far, far more), but alas. This is a particular shame owing to the fact that much of this data is available on Wikipedia in a relatively structured format, with disambiguation pages to help.
Google Squared allows users to correct data, remove erroneous results or suggest better results. The effect of this is a dynamically evolving result set. A search for a popular topic an hour ago can yield an entirely different result an hour later. All of this will help Google Squared become more accurate and cleverer over time.
Although Google Squared and WolframAlpha are quite different, there are some similarities. For this reason it is possible to state that the current score is 1-0 to WolframAlpha.
Just a quick post... Today the Guardian blog reports on the financial woes of YouTube. I don't suppose we should be particularly surprised to learn that according to some news sources YouTube is due to drop $470 million this year. When this figure is compared to the $1.65 billion pricetag Google paid a couple of years ago we can appreciate the magnitude of their YouTube predicament. The majority of this loss is attributable to the failure of advertising to bring home the bacon; a recurring issue on this blog. But huge running costs, copyright and royalty issues have played their part too. Google is reportedly interested in purchasing Twitter, but surely their failure to monetise YouTube - a service arguably more monetiseable (?) than Twitter - should have the alarm bells ringing at Google HQ?
I find the current crossroads for many of these services utterly fascinating. I don't have any solutions for any of these ventures, other than to make sure you have a business model before starting any business. Would RBS give me a business loan without a business plan and a robust revenue model? Probably not. But then they are not giving loans out these days anyway...
I posted a blog about Google's eye tracking research last month. I'm loathed to discuss Google again lest the ISG blog becomes known as the unofficial Google blog; however, the latest post on the Official Google Blog is worthy of some comment...
You might recall another post I made regarding search engine research and development, particularly in the area of information retrieval (IR) aids for users. In this posting I summarised Belkin's research and theories regarding the Anomalous State of Knowledge (ASK). Most of this and subsequent research has sought to introduce IR aids for the user so that they can better solve their ASK conundrum. This assistance varies but often takes the form of query expansion (in its various permutations), browsable subject trees to stimulate query formulation, relevance feedback, and so forth. Providing such tools in systems based on automatic indexing is difficult, but we noted that some search engines have introduced some effective retrieval aids, all designed to alleviate the ASK problem. For example, Yahoo! provides its search assist tool, Clusty provides related concept clusters, and Ask provides other similar tools. Their accuracy in IR varies widely, but overall they prove useful to user. Unfortunately, we also noted that Google provides few user aids comparable to those above, arguably relying more on its PageRank algorithm. Not any longer...
Today Google launched some interface functionality not dissimilar to Yahoo! search assist and Clusty. Their assistance provides some suggested related searches and some extra result summary text for particular results. Receiving this assistance depends on the nature of your query, so have a look at this canned search: 'communism in Russia'. This isn't bad and is better than nothing; but does it really measure up to the aids provided by competing search engines? Compare the results for these canned searches and the IR aids provided for the user by the systems we've discussed already:
Google's attempts appear quite pedestrian by comparison. Yahoo! and Clusty, for example, make their aids readily available so that the user can affect changes in their information seeking behaviour, but Google's tools are far less visible, less detailed, and offer far less functionality. Since a lot of research indicates that many users will not scroll below the 'golden triangle' (i.e. to the bottom of the first result set), it is entirely feasible to think that these 'related search' aids will go unnoticed by the disoriented information seeker.
It is good to see Google deploying user query aids and reacting to developments in other IR systems, but it appears that it will be some time before Google can be said to alleviate users' Anomalous State of Knowledge.
While catching up on some blogs I follow, I noticed that the Semantic Web-ite Ivan Herman posted comments regarding the US Congress SpaceBook – a US political answer to Facebook. He, in turn, was commenting on a blog made by the ProgrammableWeb – the website dedicated to keeping us informed of the latest web services, mashups, and Web 2.0 APIs.
From a mashup perspective, SpaceBook is pretty incredible, incorporating (so far) 11 different Web APIs. However, for me SpaceBook is interesting because it makes use of semantic data provided via FOAF and the microformat, XFN. To do this SpaceBook makes good use of the Google Social Graph API, which aims to harness such data to generate social graphs. The Social Graph API has been available for almost a year but has had quite a low profile until now. Says the API website:
"Google Search helps make this information more accessible and useful. If you take away the documents, you're left with the connections between people. Information about the public connections between people is really useful -- as a user, you might want to see who else you're connected to, and as a developer of social applications, you can provide better features for your users if you know who their public friends are. There hasn't been a good way to access this information. The Social Graph API now makes information about the public connections between people on the Web, expressed by XFN and FOAF markup and other publicly declared connections, easily available and useful for developers."
Bravo! This creates some neat connections. Unfortunately – and as Ivan Herman regrettably notes - the generated FOAF data is inserted into Hilary Clinton’s page as a page comment, rather than as a separate .rdf file or as RDFa. The FOAF file is also a little limited, but it does include links to her Twitter account. More puzzling for me though is why the embedded XHTML metadata does not use Qualified Dublin Core! Let's crank up the interoperability, please!
Anne Aula and Kerry Rodden have just published a posting on the Official Google Blog summarising some eye-tracking research they have been conducting on Google's 'Universal Search'. Both are active in information seeking behaviour and human-computer interaction research at Google and are well published within the related literature (e.g. JASIST, IPM, SIGIR, CHI, etc.).
The motivation behind their research was to evaluate the effect incorporation of thumbnail images and video within a research set has on user information seeking behaviour. Previous information retrieval eye-tracking research indicates that users scan results in order, scanning down their results until they reach a (potentially) relevant result, or until they decide to refine their search query or abandon the search. Aula and Rodden were concerned that the inclusion of thumbnail images might distract the "well-established order of result evaluation". Some comparative evaluation was therefore order of the day.
"We ran a series of eye-tracking studies where we compared how users scan the search results pages with and without thumbnail images. Our studies showed that the thumbnails did not strongly affect the order of scanning the results and seemed to make it easier for the participants to find the result they wanted."
A good finding for Google, of course; but most astonishing is the eye-tracking data. The speed with which users scanned result sets and the number of points on the interface they scanned was incredible. View the 'real time' clip below. A dot increasing in size denotes the length of time a user spent pausing at that specific point in the interface or result set. Some other interesting discoveries were made – the full posting is essential reading.
This blog follows a series of other blogs pontificating about the efficacy of search engines in information retrieval. Over the weekend Google announced the release of Google SearchWiki. Google SearchWiki essentially allows users to customise searches by re-ranking, deleting, adding, and commenting on their results. This is personalised searching (see video below). As the Official Google Blog notes:
"With just a single click you can move the results you like to the top or add a new site. You can also write notes attached to a particular site and remove results that you don't feel belong."
The advantages of this are a little unclear at first; however, things become clearer when we learn that such changes can only be affected if you have an iGoogle account. Google have – quite understandably – been very specific about this aspect of SearchWiki. Search is their bread and butter; messing with the formula would be like dancing with the devil!
Google SearchWiki doesn't do anything further to address our Anomalous State of Knowledge (ASK), nor can I see myself using it, but it is an indication that Google is interested in better exploring the potential of social data to improve relevance feedback. Google will, of course, harvest vast amounts of data pertaining to users' information seeking behaviour which can then be channelled into improving their bread and butter. (And from my perspective, I would be interested to know how they analyse such data and affect changes in their PageRank algorithm). Their move also resonates with an increasing trend to support users in their Personal Information Management (PIM); to assist users in re-finding information they have previously located, or those frequently conducting the same searches over and over. It particularly reminds me of research undertaken by Bruce et al. (2004). For example, users increasingly chose not to bookmark a useful or interesting web page, but simply find it again – because they know they can. If you continually encounter information that is irrelevant to your area, re-rank it accordingly - so the SearchWiki ethos goes...
Perusing recent blogs it is clear that some consider this development to have business motivations. Technology guru and Wired magazine founder, John Battelle, thinks SearchWiki is an attempt to attract more users of iGoogle (which at the moment is small), whilst simultaneously rendering iGoogle the centre of users’ personal web universe. To my mind Google is always about business. PageRank is a great free-text searching tool, thus permitting huge market penetration. SearchWiki is simply another business tool which happens to offer some (vaguely?) useful functionality.