Information Strategy Group, LJMU

Thursday, 22 October 2009

Blackboard on the shopping list: do Google need reining in?

Alex Spiers (Learning Innovation & Development, LJMU) alerted me via Twitter to rumours in the 'Internet playground' that Google is considering branching out into educational software. According to the article spreading the rumour, Google plans to fulfil its recent pledge to acquire one small company per month by purchasing Blackboard.

The area of educational software is not completely alien to Google. The Google Apps Education Edition (providing email, collaboration widgets, etc.) has been around for a while now (I think) and - as the article insinuates - moving deeper into educational software seems a natural progression and provides Google with clear access to a key demographic. This is all conjecture of course; but if Google acquired Blackboard I think I would suffer a schizophrenic episode. A part of me would think, "Great - Google will make Blackboard less clunky, offer more functionality and more flexiblity". But the other part (which is slightly bigger, I think) would feel extremely uncomfortable that Google is yet again moving into new areas, probably with the intention of dominating that area.

We forget how huge and pervasive Google is today. Google is everywhere and now reaches far beyond its dominant position in search into virtually every significant area of web and software development. If Google were Microsoft the US Government and the EU would be all over Google like a rash for pushing the boundaries of antitrust legislation and competition laws. This situation takes on a rather sinister tone when you consider the situation in HE if Blackboard becomes a Google subsidiary. Edge Hill University is one of several institutions which has elected to ditch fully integrated institutional email applications (e.g. MS Outlook, Thunderbird) in favour of Google Mail. Having a VLE maintained by Google therefore sets the alarm bells ringing. The key technological interactions for a 21st century student are as follows: email, web, VLE, library. Picture it - a student existence which would be entirely dependent upon one company and the directed advertising that goes with it: Google Mail, web (and their first port of call is likely to be Google, of course), GoogleBoard (the name of Blackboard if they decided to re-brand it!) and a massive digital library which Google is attempting to create and which would essentially create a de facto digital library monopoly.

I'm probably getting ahead of myself. The acquisition of Blackboard probably won't happen, and the digital library has encountered plenty of opposition, not least from Angela Merkel; but it does get me thinking that Google finally needs reining in. Even before this news broke I was starting to think that Google was turning into a Sesame Street-style Cookie Monster, devouring everything in sight. Their ubiquity can't possibly be healthy anymore, can it? Or am I being completely paranoid?

Monday, 19 October 2009

The Kindle according to Cellan-Jones

The world in which Rory Cellan-Jones exists is an interesting place. It's one which often results in a good, hard slap to the face. He can always be relied upon for some cynicism and negativity (or realism?) in his analysis of new technologies and tech related businesses. (See the last posting about Google Wave, for example.) This can be unexpected, often because he sees through the hype or aesthetics of many technologies and evaluates stuff based squarely on utilitarian principles. His overview of the Kindle is no exception to this rule:

"The Kindle looks to me like an attractive but expensive niche product, giving a few techie bibliophiles the chance to take more books on holiday without incurring excess baggage charges. But will it force thousands of bookshops to close and transform the economics of struggling newspapers? Don't bet on it."

The thing is, Cellan-Jones often talks a lot of sense. To be sure, the Kindle looks like an extremely smart piece of kit, but when Cellan-Jones stacks up the realities of the Kindle one wonders whether it'll be the game changer everyone is expecting it to be.

The focus for the Kindle seems to be on the best seller lists and the broad sheets. An area which appears to have eluded adequate exposition by all the tech commentators is the use of this new generation of e-book readers to deliver text books, learning materials, etc. This was always considered an important area for the early e-book readers. Why carry lots of heavy text books around when you could have them all on your Kindle or Sony Reader Touch, and be in a position to browse and search the content therein more effectively? Or, is this an extravagant use of E-Ink? E-Ink is required for lengthy reading sessions (i.e. novel) rather than dipping in and out of text books to complete academic tasks, something for which a netbook or mobile device might be better. So what happens to the future of e-book readers in academia?

Friday, 9 October 2009

Wave a washout?

This is just a brief posting to flag up a review of Google Wave on the BBC dot.life blog.

Google unveiled Wave at their Google I/O conference in late May 2009. The Wave development team presented a lengthy demonstration of what it can do and – given that it was probably a well rehearsed presentation and demo – Wave looked pretty impressive. It might be a little bit boring of me, but I was particularly impressed by the context sensitive spell checker ("Icland is an icland" – amazing!). Those of you that missed that demonstration can check it out in the video below. And try not to get annoyed at the sycophantic applause of their fellow Google developers...

Since then Wave has been hyped up by the technology press and even made mainstream news headlines at the BBC, Channel 4 News, etc. when it went on limited (invitation only) release last week. Dot.life has reviewed Wave and the verdict was not particularly positive. Surprisingly they (Rory Cellan-Jones, Stephen Fry, Bill Thompson and others) found it pretty difficult to use and pretty chaotic. I'm now anxious to try it out myself because I was convinced that it would be pretty amazing. Their review is funny and worth reading in full; but the main issues were noted as follows:

"Well, I'm not entirely sure that our attempt to use Google Wave to review Google Wave has been a stunning success. But I've learned a few lessons.

First of all, if you're using it to work together on a single document, then a strong leader (backed by a decent sub-editor, adds Fildes) has to take charge of the Wave, otherwise chaos ensues. And that's me - so like it or lump it, fellow Wavers.

Second, we saw a lot of bugs that still need fixing, and no very clear guide as to how to do so. For instance, there is an "upload files" option which will be vital for people wanting to work on a presentation or similar large document, but the button is greyed out and doesn't seem to work.

Third, if Wave is really going to revolutionise the way we communicate, it's going to have to be integrated with other tools like e-mail and social networks. I'd like to tell my fellow Wavers that we are nearly done and ready to roll with this review - but they're not online in Wave right now, so they can't hear me.

And finally, if such a determined - and organised - clutch of geeks and hacks struggle to turn their ripples and wavelets into one impressive giant roller, this revolution is going to struggle to capture the imagination of the masses."

My biggest concern about Wave was the important matter of critical mass, and this is something the dot.life review hints at too. A tool like Wave is only ever going to take off if large numbers of people buy into it; if your organisation suddenly dumps all existing communication and collaboration tools in favour of Wave. It's difficult to see that happening any time soon...

Thursday, 8 October 2009

AJAX content made discoverable...soon

I follow the Official Google Webmaster Central Blog. It can be an interesting read at times, but on other occasions it provides humdrum information on how best to optimise a website, or answers questions which most of us know the answers to already (e.g. recently we had, 'Does page metadata influence Google page rankings?'). However, the latest posting is one of the exceptions. Google have just announced that they are proposing a new standard to make AJAX-based websites indexable and, by extension, discoverable to users. Good ho!

The advent of Web 2.0 has brought about a huge increase in interactive websites and dynamic page content, much of which has been delivered using AJAX ('Asynchronous JavaScript and XML', not a popular household cleaner!). AJAX is great and furnished me with my iGoogle page years ago; but increasingly websites use it to deliver page content which might otherwise be delivered using static web pages in XHTML. This presents a big problem for search engines because AJAX is currently un-indexable (if this is a word!) and a lot of content is therefore invisible to all search engines. Indeed, the latest web design mantra has been "don't publish in AJAX if you want your website to be visible". (There are also accessibility and usability issues, but these are an aside for this posting...)

The Webmaster Blog summarises:

"While AJAX-based websites are popular with users, search engines traditionally are not able to access any of the content on them. The last time we checked, almost 70% of the websites we know about use JavaScript in some form or another. Of course, most of that JavaScript is not AJAX, but the better that search engines could crawl and index AJAX, the more that developers could add richer features to their websites and still show up in search engines."

Google's proposal involves shifting the responsibility of indexing the website to the administrator/webmaster of the website, whose responsibility it would be to set up a headless browser on the web server. (A headless browser is essentially a browser without a user interface; a piece of software that can access web documents but does not deliver them to human users). The headless browser would then be used to programmatically access the AJAX website on the server and provide an HTML 'snap shot' to search engines when they request it - which is a clever idea. The crux of Google's proposal is a suite of URL protocols. These would control when the search engine knows to request the headless browser information (i.e. HTML snapshot) and which URL to reveal to human users.

It's good that Google are taking the initiative; my only concern is that they start trying to re-write standards, as they have a little with RDFa. Their slides are below - enjoy!

Wednesday, 23 September 2009

Yahoo! is alive and kicking!

In a recent posting I discussed the partnership between Yahoo! and Microsoft and wondered whether this might bring an end to Yahoo!'s innovate information retrieval work. Yesterday the Official Yahoo! Search Blog announced big changes to Yahoo! Search. Many of the these changes have been discussed in previous postings here (e.g. Search Assist, Search Pad, Search Monkey, etc.); however, Yahoo! have updated their search and results interface to make better use of these tools. As they state:

"[These changes] deliver a dynamic, compelling, and integrated experience that better understands what you are looking for so you can get things done quickly on the Web."

To us it means better integration of user query formation tools, better use of structured data on the Web (e.g. RDF data, metadata, etc.) to provide improved results and results browsing, and improved filtering tools, something which is nicely explained in their grand tour. According to their blog though, better integration of these innovations involved a serious overhaul of the Yahoo! Search technical architecture to make it run faster.

"Now, here's the best part: Rather than building this new experience on top of our existing front-end technology, our talented engineering and design teams rebuilt much of the foundational markup/CSS/JavaScript for the SRP design and core functionality completely from scratch. This allowed us to get rid of old cruft and take advantage of quite a few new techniques and best practices, reducing core page weight and render complexity in the process."

I sound like a sales officer for Yahoo!, but these improvements are really very good indeed and have to be experienced first hand. It's good to see that the intellectual capital of Yahoo! has not disappeared, and fingers-crossed it never will. True - these updates were probably already in the pipeline months before the partnership with Microsoft; but it at least demonstrates to Microsoft why it still has the upper hand in Web search.

Thursday, 13 August 2009

Trough of disillusionment for microblogging and social software?

The IT research firm Gartner has recently published another of its technology reports for 2009: Gartner's Hype Cycle Special Report for 2009. This report is another in a long line of similar Gartner reports which do exactly what they say on the tin. That is, they provide a technology 'hype cycle' for 2009! Did you see that coming?! The technology hype cycle was a topic that Johnny Read recently discussed at an ISG research reading group, so I thought it was worth commenting on.

According to Gartner - who I believe introduced the concept of the technology hype cycle - the expectations of new or emerging technology grows far more quickly that the technology itself. This is obviously problematic since user expectations get inflated only to be deflated later as the true value of the technology slowly becomes recognised. This true value is normally reached when the technology experiences mainstream use (i.e. plateau of productivity). The figure below illustrates the basic principles behind the hype cycle model.

The latest Gartner hype cycle (below) is interesting - and interesting is really as far as you can go with this because it's unclear how the hype cycles are compiled and whether they can be used for forecasting or as a true indicator of technology trends. Nevertheless, according to the hype cycle 2009, microblogging and social networking are on the decent into the trough of disillusionment.

From a purely personal view this is indeed good news as it might mean I don't have to read about Twitter in virtually every technology newspaper, blog and website for much longer, or be exposed to a woeful interview of the Twitter CEO on Newsnight. But I suppose it is easy to anticipate the plateau of productivity for these technologies. Social software has been around for a while now, and my own experiences would suggest that many people are starting to withdraw from it; the novelty has worn off. And remember, it's not just users that perpetuate the hype cycle, those wishing to harness the social graph for directed advertising, marketing, etc. are probably sliding down the trough of disillisionment too as the promise of a captive audience has not been financially fulfilled.

It's worth perusing the Gartner report itself - interesting. The above summary hype cycle figure doesn't seem to be available at the report, so I've linked to the version available at the BBC dot.life blog which also discusses the report.

Tuesday, 4 August 2009

Extending the FOAF vocabulary for junkets, personal travel and map generation

As we know, FOAF provides a good way of exposing machine-readable data on people, their activities and interests, and the nature of their relationships with other people, groups or things. FOAF allows us to model social networks much in the same way as a social networking service might (e.g. Facebook). The big difference being that with FOAF the resultant social graph is exposed to the Semantic Web in a distributed way for machine processing (and all the goodness that this might entail…); not held in proprietary databases.

FOAF data has generally always been augmented with other RDF vocabularies. Nothing strange in this; this was anticipated, and reusing and remixing vocabularies and RDF data is a key component of the Semantic Web. My FOAF profile, for example, uses numerous additional vocabularies for enrichment, including Dublin Core, the Music Ontology, and the Contact, Relationship

and Basic Geo vocabularies. The latter vocabulary (Basic Geo) provides the hook for this blog posting.

The need to provide geographical coordinates and related data in RDF was recognised early in the life of the Semantic Web, and the Basic Geo (WGS84 lat/long) Vocabulary website lists obvious applications for such data. Although including geographical data within FOAF profiles presents an obvious use (e.g. using Basic Geo to provide the latitude and longitude of, say, your office location), few people do it because few applications actually do anything with it. That was until a couple of years ago when Richard Cyganiak (DERI, National University of Ireland) developed an experimental FOAF widget (FOAF – Where Am I?) to determine geographical coordinates using the Google Maps API and then to spit it out in FOAF RDF/XML for inclusion in a FOAF profile. In his words, "there's no more excuses for [not including coordinates]". With coordinates included, FOAF profiles could be mapped using Alexandre Passant's FOAFMap.net widget (also from DERI), which was developed around the same time and extracts geographical data embedded within FOAF profiles and then maps it using Google Maps. Despite the presence of these useful widgets, FOAF profiles rarely contain location data because, let's face it, are we that interested in a precise geographical location of an office?!

More interesting – and perhaps more useful – is to model personal travel within a FOAF profile. This is consistent with the recent emergence (within the past year or so) of 'smart travel' services on the web, the most notable of which is probably Dopplr. Dopplr essentially allows users to create, share and map details of future journeys and travel itineraries with friends, colleagues, business contacts, etc. so that overlaps can be discovered in journey patterns and important meetings arranged between busy persons. It is also consistent with the personal homepages of academics and researchers. For example, Ivan Herman's (W3C Semantic Web Activity Lead) website is one of many which include a section about upcoming trips. There are others too. From personal experience I can confirm that many an international research relationship has been struck by knowing who is going to be at the conference you are attending next week! People also like to record where they have been and why, and the 'Cities I've Visited' Facebook application provides yet another example of wanting to associate travel with a personal profile, albeit within Facebook.

Of course, Dopplr and Facebook applications are all well and good; but we want to expose these journeys and travel itineraries in a distributed and machine processable way - and FOAF profiles are the obvious place to do it. It is possible to use the RDF Calendar vocabulary to model some travel, but it's a little itchy and can't really tell us the purpose of a journey. Other travel ontologies exist, but they are for 'serious' travel applications and too heavy weight for a simple FOAF profile. It therefore occurred to me that there is a need for a light weight RDF travel vocabulary, ideally for use with FOAF, which can better leverage the power of existing vocabularies such as Basic Geo and RDF Calendar. I documented my original thoughts about this on my personal blog, which I use of more technical musings. Enriching a FOAF profile with such data would not only expose it to the Semantic Web and enrich social graphs, but make applications (similar to those described above) possible in an open way.

To this end I have started authoring the Travelogue RDF Vocabulary (TRAVOC). It's a pretty rough and ready approach (c'mon, 3 -4 hours!) and is really just for experimental purposes; but I have published what I have so far. A formal RDF Schema is also available. Most properties entail being a foaf:Person and I have provided brief examples on my blog.

As noted, TRAVOC has been sewn together in a short order. It would therefore benefit from some further consideration, refinements and (maybe) expansion. Perhaps there's a research proposal in it? Thoughts anyone? In particular, I would be interested know if any of the TRAVOC properties overlap with existing vocabularies which I haven't been able to find. If I have time – and if and when I am satisfied with the final vocabulary - I may acquire the necessary PURLs.

Wednesday, 29 July 2009

Is it R.I.P. for Yahoo! as we know it?

And so Microsoft and Yahoo! finally agree terms of a partnership which will change the face of the web search market. Historically – and let's face it, this story has been ongoing since January 2008! – Microsoft always wanted to take over Yahoo!; but on reflection both parties probably felt that forging a partnership was most likely to give them success against the market leader. So this has to be good news, no?

Well, it'll do some good to have the dominance of Google properly challenged by the next two biggest fish - and Google will probably be concerned. But their partnership entails that Yahoo! Search be powered by Bing and, in return, Yahoo! will become the sales force for both companies' premium search advertising. We've noted recently that Bing is good and was an admirable adversary for Yahoo!, but will a Yahoo! front-end powered by a Bing back-end mean an end to some of Yahoo!'s excellent retrieval tools (often documented on this blog, see this, this, this and this, for example) and, more importantly, an end to their innovative research strategies to better harness the power of structured data on the web? Is the innovative SearchMonkey Open Search Platform to be jettisoned?

The precise details of the partnership are sketchy at the moment, but it would be tragic if this intellectual capital was to be lost or now neglected...

Thursday, 16 July 2009

Broken business models again...

In a tenuous link with several previous blog postings (this one and this one), the latest BBC dot.life posting by Cellan-Jones discusses the future of the music industry. It's an interesting summary of recent research on the music habits of the British public. Surprisingly, CDs remain by far the most popular music format, even amongst teenagers. This pleases me because – although I am a man that enjoys his eMusic downloads - I am also a chap that enjoys the CD, its artwork, its liner notes, its aesthetic qualities, etc.

Of course, the big finding that people have been latching onto is the large reduction in illegal file sharing. This is indeed good news; however, whilst many of these music fans will have switched to legal download services (e.g. iTunes, eMusic, Amazon, take your pick....), many have reverted to legal streaming services like Spotify. The trouble is, as Cellan-Jones points out, Spotify is another service lacking a robust business strategy. Advertising doesn't bring home the bacon and Spotify is relying on users upgrading to their pay-for premium service. Unfortunately, nobody is. Without this revenue stream Spotify is doomed in the longer term. Nothing new in this; Spotify simply joins the growing number of Web 2.0 services that are failing to monetise their innovations.

By coincidence Guardian columnist, Paul Carr, authored an article a few days ago entitled, 'I'm calling a 'time of death' for London's internet startup industry'. The article laments the failure of London based Web 2.0 companies to experience any modicum of successful or profitability. Many of his arguments have been applied elsewhere, but the London focus makes it compelling reading, particularly because Carr was around during the first dot.com boom and has personally witnessed the mysterious nature of revenue within new media. His book, 'Bringing Nothing To The Party: True Confessions Of A New Media Whore', says it all. Like Cellan-Jones, Carr also singles out Spotify, although professing to be "discreet with names". Says Carr:

"You see, the sad but true fact – and I've said this before, albeit in less aggressive terms – is that the London internet industry is increasingly, and terminally, screwed. I'll be discreet with names so as not to make things worse but since I've been back in town, I've met no fewer than three once-successful entrepreneurs who admit they're running out of money at a sickening rate (personally and professionally) with no prospect of raising more. I've seen two businesses close and one having its funding yanked suddenly because, basically, it was going nowhere fast. Everyone I speak to has the same story: investors aren't investing, revenues aren't coming, founders are being forced out – or leaving of their own accord – and no one seems to have the first idea what to do about it. Even Spotify, the current darling of London startups (which is actually from Sweden), might not be doing as well as it appears. The company says it's projecting profitability by the end of the year, with a senior staffer boasting about that fact to the geeks at the Juju event. Unfortunately, when one blogger challenged him to provide numbers to back it up, he was forced to admit that the profitability is less "projected" and more "hoped for". Meanwhile, rivals (and fellow London poster-children) Last.fm just saw all three of their founders depart the company leaving a huge hole at the top during a time of massive uncertainty. However you dress it up, that's not good."

No - it's not good; but when is the madness all going to end? Like many others, I keep on thinking the end is 'just round the corner', but it never comes. How many insane venture capitalists are left? Will it be a house of cards, and, if so, which card is going to be removed first? Perhaps a little schadenfreude is order of the day - shall we have a sweepstake?

Tuesday, 7 July 2009

Welcome to my (Search) Pad

Search innovators at Yahoo! have today launched Search Pad. Search Pad integrates with the usual Yahoo! Search interface and allows users to take notes while conducting common information seeking tasks (e.g. researching a holiday, whether to buy that new piece of gadgetry, etc.). Search Pad can track the websites users are visiting and is invoked when it considers the user to be conducting a research task. On the Yahoo! Search Blog today:

"Search Pad helps you track sites and make notes by intelligently detecting user research intent and automatically collecting sites the user visits. Search Pad turns on automatically when you're doing research, tracking sites to make document authoring a snap. You can then quickly edit and organize your notes with the Search Pad interface, which includes drag-and-drop functionality and auto-attributed pasting."

Nice. From the website and Yahoo! blog (and this video), Search Pad is in many ways reminiscent of Listas from Microsoft Live Labs (and discussed on this blog before). It's possible to copy text, images and create lists for sharing with others, either via URL or via other services (e.g. Facebook, Twitter, Delicious). Search Pad also has an easy to use menu driven interface. Whilst it was useful in some circumstances, Listas lacked a worthy application; however, Search Pad builds on Listas functionality and instead has incorporated an improved version of it within a traditional search interface to do something we often do when we are searching (i.e. take notes about a search task).

The only problem is that I can't get it to work!! I have tried conducting a variety of 'obvious' research tasks which I anticipated Search Pad would recognise, but the Search Pad console hasn't appeared. Perhaps the 'intelligent detection' isn't has intelligent as promised? I'll keep trying, but please let me know if anyone has better luck. Still, it demonstrates the state of permanent innovation at Yahoo! Search.

Wednesday, 1 July 2009

When Web 2.0 business models and accessibility collide with information services and e-learning...

Rory Cellan-Jones has today posted his musings on the current state of Facebook at the BBC dot.life blog. His posting was inspired by an interview with Sheryl Sandberg (Chief Operating Officer) and was originally billed as 'Will Facebook ever make any money?'. Sandberg was recruited from Google last year to help Facebook turn a financial corner. According to her interview with Cellan-Jones, Facebook is still failing to break even, but her projections are that Facebook will start to turn a profit by the end of 2010. If true, this will be good news for Facebook. Not everyone believes this of course, including Cellan-Jones judging by his questions, his raised left eye brow and his prediction that tighter EU regulation will harm Facebook growth. Says Cellan-Jones:

"And [another] person I met at Facebook's London office symbolised the firm's determination to deal with its other challenge - regulation.

Richard Allan, a former Liberal Democrat MP and then director of European government affairs at Cisco, has been hired to lobby European regulators for Facebook.

With the EU mulling over tighter privacy rules for firms that share their users' data, and with continuing concern from politicians about issues like cyber-bullying and hate-speak on social networks, there will be plenty on Mr Allan's plate.

So, yes, Facebook suddenly looks like a mature business, poised for steady progress towards profitability and ready to engage in grown-up conversations about its place in society. Then again, so did MySpace a year ago, until it suddenly went out of fashion."

This is all by way of introduction, because a few weeks ago I attended the CILIP MmIT North West day conference on 'Emerging technologies in the library' at LJMU. A series of interesting speakers, including Nick Woolley, Russell Prue and Jane Secker, pondered the use of new technologies in e-learning, digital libraries and other information services. Of course, one of the recurring themes to emerge throughout the day was the innovative use of social networking tools in e-learning or digital library contexts. To be sure, there is some innovative work going on; but none of the speakers addressed two elephants in the room:

Service longevity, and;
Accessibility

For me these are the two biggest threats to social media use within universities.

The adoption of Facebook, YouTube, MySpace, Twitter (and the rest) within universities has been rapid. Many in the literature and at conferences evangelise about the adoption of these tools as if their use was now mandatory. Nick Woolley voiced sensible concerns over this position. An additional concern that I have – and one I had hoped to verbalise at some point during the proceedings – is whether it is appropriate for services (whether e-learning or digital libraries, or whatever) to be going to the effort of embedding these technologies within curricula or services when they are third party services over which we have little control and when their economic futures are so uncertain.

The magic word at the MmIT event was 'free'. "Make use of this tool – it's free and the kids love it!". Very few of the tools over which LISers and learning technologists get excited about actually have viable business models. Google lost almost $500 million on YouTube in the year up to April 2009 and is unable to turn it into a viable business. MySpace is struggling and slashing staff, Facebook's future remains uncertain, Twitter currently has no business model at

all and is being propped up by venture capitalists while it contemplates desperate ways to create revenue, and so the list continues. Will any of these services still be here next year? Well published and straight talking advertising consultant, George Parker, has been pondering the state of social media advertising on his blog recently (warning – he is straight talking and profanities are order of the day!). He has insightful comments to make though on why most of these services are never going to make spectacular amounts of money from their current (failed?) model (i.e. advertising). According to Parker, advertising is just plain wrong. Niche markets where subscriptions are required will be the only way for these services to make decent money...

A more general concern relates to the usability and accessibility of social networking services. Very few of them, if any, actually come close to minimal W3C accessibility guidelines, or DDA and the Special Educational Needs and Disability Act (SENDA) 2001. Surely there are legal and ethical questions to be asked, particularly of universities? Embedding these third party services into curricula seems like a good idea but it's one which could potentially exclude students from the same learning experience as others. This is a concern I have had for a few years now, but I had thought it would, a) have been resolved by services voluntarily by now, and, b) institutions wishing to deploy them would have taken measures to resolve it (this might be not using them at all!). Obviously not...

There are many arguments for not engaging with Web 2.0 at university, and - where appropriate - many of these arguments were cogently made at the MmIT conference. But if adopting such technologies is considered to be imperative, should we not be making more of an effort to develop tools that replicate their functionality, thus allowing control over their longevity and accessibility? Attempts at this have hitherto been pooh-poohed on the grounds that interrupting habitual student behaviour (i.e. getting students to switch from, say, Facebook to an academic equivalent) was too onerous, or that replicating the social mass and collaborative appeal of international networking sites couldn't be done within academic environments. But have we really tried hard enough? Most have been half-baked efforts. It is also noteworthy that research conducted by Mike Thelwall and published in JASIST indicates that homophily continues within social networking websites. If this is true, then it is likely that getting students to make the switch to locally hosted equivalents of Facebook or MySpace is certainly possible, particularly as the majority of their network will comprise similar people within similar academic situations.

Perhaps there is more of a need for the wider adoption of social web markup languages, such as the User Labor Markup Language (ULML), to enable users to switch between disparate social networking services whilst simultaneously allowing the portability of social capital (or 'user labour') from one service to another? This would make the decision to adopt academic equivalents far more attractive. However, if this is the case, then more research needs to be undertaken to extend ULML (and other options) to make them fully interoperable with the breadth of services currently available.

I don't like putting a downer on all the innovative and excellent work that the LIS and e-learning communities are doing in this area; it's just that many seem to be oblivious to these threats and are content to carry on regardless. Nothing good ever comes from carrying on regardless, least of all that dreadful tune by the Beautiful South. Let's just talk about it a bit more and actually acknowledge these issues...

Friday, 26 June 2009

Read all about it: interesting contributions at ISKO-UK 2009

I had the pleasure of attending the ISKO-UK 2009 conference earlier this week at University College London (UCL), organised in association with the Department of Inf ormation Studies. This was my first visit to the home of the architect of Utilitaria nism, J eremy Bentham, and the nearby St. Pancras International since it has been revamped - and what a smart train station it is.

The ISKO conference theme was 'content architecture', with a particular focus on:

"Integration and semantic interoperability between diverse resources – text, images, audio, multimedia
Social networking and user participation in knowledge structuring
Image retrieval
Information architecture, metadata and faceted frameworks"

The underlying themes throughout most papers were those related to the Semantic Web, Linked Data, and other Seman

tic Web inspired approaches to resolving or ameliorating common problems within our disciplines. There were a great many interesting papers delivered and it is difficult to say something about them all; however, for me, there were particular highlights (in no particular order)...

Libo Eric Si (et al.) from the Department of In for mation Science at Loughboro ugh University described research to develop a prototype middleware framework between disparate terminology resources to facilitate subject cross-browsing of information and library portal systems. A lot of work has already been undertaken in this area (see for example, HILT project (a project in which I used to be involved), and CrissCross), so it was interesting to hear about his 'bag' approach in which – rather than using precise mappings between different Knowledge Organisation Systems (KOS) (e.g. thesauri, subject heading lists, taxonomies, etc.) - "a number of relevant concepts could be put into a 'bag', and the bag is mapped to an equivalent DDC concept. The bag becomes a very abstract concept that may not have a clear meaning, but based on the evaluation findings, it was widely-agreed that using a bag to combine a number of concepts together is a good idea".

Brian Matthews (et al.) reported on an evaluation of social tagging and KOS. In par

ticular, they investigated ways of enhancing social tagging via KOS, with a view to improving the quality of tags for improvements in and retrieval performance. A detailed and robust methodology was provided, but essentially groups of participants were given the opportunity to tag resources using tags, controlled terms (i.e. from KOS), or terms displayed in a tag cloud, all within a specially designed demonstrator. Participants were later asked to try alternative tools in order to gather data on the nature of user preferences. There are numerous findings - and a pre-print of the paper is already available on the conference website so you can read these yourself - but the main ones can be summarised from their paper as follows and were surprising in some cases:

"Users appreciated the benefits of consistency and vocabulary control and were potentially willing to engage with the tagging system;
There was evidence of support for automated suggestions if they are appropriate and relevant;
The quality and appropriateness of the controlled vocabulary proved to be important;
The main tag cloud proved problematic to use effectively; and,
The user interface proved important along with the visual presentation and interaction sequence."

The user preference for controlled terms was reassuring. In fact, as Matthews et al. report:

"There was general sentiment amongst the depositors that choosing terms from a controlled vocabulary was a "Good Thing" and better than choosing their own terms. The subjects could overall see the value of adding terms for information retrieval purposes, and could see the advantages of consistency of retrieval if the terms used are from an authoritative source."

Chris Town from the University of Cambridge Computer Laboratory presented two (see [1], [2]) equally interesting papers relating to image retrieval on the Web. Although images and video now comprise the majority of Web content, the vast majority of retrieval systems essentially use text, tags, etc. that surround images in order t

o make assumptions about what the image might be. Of course, using any major search engine we discover that this approach is woefully inaccurate. Dr. Town has developed improved approaches to content-based image retrieval (CBIR) which provide a novel way of bridging the 'semantic gap' between the retrieval model used by the system and that of the user. His approach is founded on the "notion of an ontological query language, combined with a set of advanced automated image analysis and classification models". This approach has been so successful that he has founded his own company, Imense. The difference in performance between Imense and Google is staggering and has to been seen to be believed. Examples can be found in his presentation slides (which will be on the ISKO website soon), but can observed from simply messing around on the Imense Picture Search.

Chris Town's second paper essentially explored how best to do the CBIR image processing required for the retrieval system. According to Dr. Town there are approximately 20 billion images on the web, with the majority at a high resolution, meaning that by his calculation it would take 4000 years to undertake the necessary CBIR processing to facilitate retrieval! Phew! Large-scale grid computing options therefore have to be explored if the approach is to be scalable. Chris Town and his colleague Karl Harrison therefore undertook a series of CBIR processing evaluations by distributing the required computational task across thousands of Grid nodes. This distributed approach resulted in the processing of over 25 million high resolution images in less than two weeks, thus making grid processing a scalable option for CBIR.

Andreas Vlachidis (et al.) from the Hypermedia Research Unit at the University of Gla morgan described the use of 'information extraction' techniques employing Natural Language Processing (NLP) techniques to assist in the semantic indexing of archaeological text resources. Such 'Grey Literature' is a good tes

t bed as more established indexing techniques are insufficient in meeting user needs. The aim of the research is to create a system capable of being "semantically aware" during document indexing. Sounds complicated? Yes – a little. Vlachidis is achieving this by using a core cultural heritage ontology and the English Heritage Thesauri to support the 'information extraction' process and which supports "a semantic framework in which indexed terms are capable of supporting semantic-aware access to on-line resources".

Perhaps the most interesting aspect of the conference was that it was well attended by people outside the academic fraternity, and as such there were papers on how these organisations are doing innovative work with a range of technologies, specifications and standards which, to a large extent, remain the preserve of researchers and academics. Papers were delivered by technical teams at the World Bank and Dow Jones, for example. Perhaps the most interesting contribution from the 'real world' though was that delivered by Tom Scott, a key member of the BBC's online and technology team. Tom is a key proponent of the Semantic Web and Linked Data at the BBC and his presentation threw light on BBC activity in this area – and rather coincidentally complemented an accidental discovery I made a few weeks ago.

Tom currently leads the BBC Earth project which aims to bring more of the BBC's Natural History content online and bring the BBC into the Linked Data cloud, thus enabling intelligent linking, re-use, re-aggregation, with what's already available. He provided interesting examples of how the BBC was exposing structured data about all forms of BBC programming on the Web by adopting a Linked Data approach and he expressed a desire for users to traverse

detailed and well connected RDF graphs. Says Tom on his blog:

"To enable the sharing of this data in a structured way, we are using the linked data approach to connect and expose resources i.e. using web technologies (URLs and HTTP etc.) to identify and link to a representation of something, and that something can be person, a programme or an album release. These resources also have representations which can be machine-processable (through the use of RDF, Microformats, RDFa, etc.) and they can contain links for other web resources, allowing you to jump from one dataset to another."

Whilst Tom conceded that this work is small compared to the entire output and technical activity at the BBC, it still constitutes a huge volume of data and is significant owing to the BBC's pre-eminence in broadcasting. Tom even reported that a SPARQL end point will be made available to query this data. I had actually hoped to ask Tom a few questions during the lunch and coffee breaks, but he was such a popular guy that in the end I lost my chance, such is the existence of a popular techie from the Beeb.

Pre-print papers from the conference are available on the proceedings page of the ISKO-UK 2009 website; however, fully peer reviewed and 'added value' papers from the conference are to be published in a future issue of Aslib Proceedings.

Tuesday, 16 June 2009

11 June 2009: the day Common Tags was born and collaborative tagging died?

Mirroring the emergence of other Web 2.0 concepts, 2004-2006 witnessed a great deal of hyperbole about collaborative tagging (or 'folksonomies' as they are sometimes known). It is now 2009 and most of us know what collaborative tagging is so I'll avoid contributing to the pile of definitions already available. The hype subsided after 2006 (how active is Tagsonomy now?), but the implementation of tagging within services of all types didn't; tagging became and is ubiquitous.

The strange thing about collaborative tagging is that when it emerged the purveyors of its hype (e.g. Clay Shirky in particular, but there were many others) drowned out the comments made by many in the information, computer and library sciences. The essence of these comments was that collaborative tagging broke so many of the well established rules of information retrieval that it would never really work in general resource discovery contexts. In fact, collaborative tagging was so flawed on a theoretical level that further exploration of its alleged benefits was considered futile. Indeed, to this day, research has been limited for this reason, and I recall attending a conference in Bangalore in which lengthy discussions ensued about tagging being ineffective and entirely unscalable. For the tagging evangelists though, these comments simply provided proof that these communities were 'stuck-in-their-way' and harboured an unwillingness to break with theoretical norms. One of the most irritating aspects of the position adopted by the evangelists was that they relied on the power of persuasion and were never able to point to evidence. Moreover, even their powers of persuasion were lacking because most of them were generally 'technology evangelists' with no real understanding of the theories of information retrieval or knowledge organisation; they were simply being carried along by the hype.

The difficulties surrounding collaborative tagging for general resource discovery are multifarious and have been summarised elsewhere; but one of the intractable problems relates to the lack of vocabulary control or collocation and the effect this has on retrieval recall and precision. The Common Tags website summarises the root problem in three sentences (we'll come back to Common Tags in a moment…):

"People use tags to organize, share and discover content on the Web. However, in the absence of a common tagging format, the benefits of tagging have been limited. Individual things like New York City are often represented by multiple tags (like 'nyc', 'new_york_city', and 'newyork'), making it difficult to organize related content; and it isn’t always clear what a particular tag represents—does the tag 'jaguar' represent the animal, the car company, or the operating system?"

These problems have been recognised since the beginning and were anticipated in the theoretical arguments posited by those in our communities of practice. Research has therefore focused on how searching or browsing tags can be made more reliable for users, either by structuring them, mapping them to existing knowledge structures, or using them in conjunction with other retrieval tools (e.g. supplementing tools based on automatic indexing). In short, tags in themselves are of limited use and the trend is now towards taming them using tried and tested methods. For advocates of Web 2.0 and the social ethos it often promotes, this is really a reversal of the tagging philosophy - but it appears to be necessary.

The root difficulty relates to use of collaborative tagging in Personal Information Management (PIM). Make no bones about it, tagging originally emerged as PIM tool and it is here that it has been most successful. I, for example, make good use of BibSonomy to organise my bookmarks and publications. BibSonomy might be like delicious on steroids, but one of its key features is the use of tags. In late 2005 I submitted a paper to the WWW2006 Collaborative Tagging Workshop with a colleague. Submitted at the height of tagging hyperbole, it was a theoretical paper exploring some of the difficulties with tagging as general resource discovery tool. In particular, we aimed to explore the difficulties in expecting a tool optimised for PIM to yield benefits when used for general resource discovery and we noted how 'PIM noise' was being introduced into users' results. How could tags that were created to organise a personal collection be expected to provide a reasonable level of recall, let alone precision? Unfortunately it wasn't accepted; but since it scored well in peer review I like to think that the organising committee were overwhelmed by submissions!! (It is also noteworthy that no other collaborative tagging workshops have been held since.)

Nevertheless, the basic thesis remains valid. It is precisely this tension (i.e. PIM vs. general resource discovery) which has compromised the effectiveness of collaborative tagging for anything other than PIM. Whilst patterns can be observed in collaborative tagging behaviour, we generally find that the problems summarised in the Common Tags quote above are insurmountable – and this simply because tags are used for PIM first and foremost, and often tell us nothing about the intellectual content of the resource ('toPrint' anyone? 'toRead', 'howto', etc.). True – users of tagging systems can occasionally discover similar items tagged by other users. But how useful is this and how often do you do it? And how often do you search tags? I never do any of these things because the results are generally feeble and I'm not particularly interested in what other people have been tagging. Is anyone? So whilst tags have taken off in PIM, their utility in facilitating wider forms of information retrieval has been quite limited.

Common Tags

Last Friday the Common Tags initiative was officially launched. Common Tags is a collaboration between some established Web companies and university research centres, including DERI at the National University of Ireland and Yahoo!. It is an attempt to address the multifarious problems above and to widen the use of tags. Says the Common Tags website:

"The Common Tag format was developed to address the current shortcomings of tagging and help everyone—including end users, publishers, and developers—get more out of Web content. With Common Tag, content is tagged with unique, well-defined concepts – everything about New York City is tagged with one concept for New York City and everything about jaguar the animal is tagged with one concept for jaguar the animal. Common Tag also provides access to useful metadata that defines each concept and describes how the concepts relate to one another. For example, metadata for the Barack Obama Common Tag indicates that he's the President of the United States and that he’s married to Michelle Obama."

Great! But how is Common Tags achieving this? Answer: RDFa. What else? Common Tags enables each tag to be defined using a concept URI taken from Freebase or DBPedia (much like more formal methods, e.g. SKOS/RDF) thus permitting the unique identification of concepts and ameliorating some of our resource discovery problems (see Common Tags workflow diagram below). A variety of participating social bookmarking websites will also enable users to bookmark using Common Tags (e.g. ZigTag, Faviki, etc.). In short, Common Tags attempts to Semantic Web-ify tags using RDFa/XHTML compliant web pages and in so doing makes tags more useful in general resource discovery contexts. Faviki even describes them as Semantic Tags and employs the logo strap line, 'tags that make sense'. Common Tags won't solve everything but at least to will see some improvement recall and increase the precision in certain circumstances, as well as offering the benefits of Semantic Web integration.

So, in summary, collaborative tagging hasn't died, but at least now - at long last - it might become useful for something other than PIM. There is irony in the fact that formal description methods have to be used to improve tag utility, but will the evangelists see it? Probably not.

Friday, 12 June 2009

Serendipity reveals ontological description of BBC programmes

I have been enjoying Flight of the Conchords on BBC Four recently. Unfortunately, I missed the first couple of episodes of the new series. So that I could configure my Humax HDR to record all future episodes, I visited the BBC website to access their online schedule. It was while doing this that I discovered visible usage of the BBC's Programmes Ontology. The programme title (i.e. Flight of the Conchords) is hyperlinked to an RDF file on this schedule page.

The Semantic Web is supposed to provide machine readable data, not human readable data, and hyperlinking to an RDF/XML file is clearly a temporarily glitch at the Beeb. After all, 99.99% of BBC users clicking on these links would be hoping to see further details about the programme, not to be presented with a bunch of angled brackets. Nevertheless, this glitch provides an interesting insight for us since it reveals the extent to which RDF data is being exposed on the Semantic Web about BBC programming, and the vocabularies the BBC are using. Researchers at the BBC are active in dissemination (e.g. ESWC2009, XTech 2008), but it's not often that you surreptitiously discover this sort of stuff in action at an organisation like this.

The Programme Ontology is based significantly on the Music Ontology Specification and the FOAF Vocabulary Specification, but their data deploys – admittedly not in the example below, except in the namespace declarations – Dublin Core and SKOS.

Oh, and the next episode of Flight of the Conchords is on tonight at 23:00, BBC Four.

<?xml version="1.0" encoding="utf-8"?>
<rdf:RDF xmlns:rdf = "http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:rdfs = "http://www.w3.org/2000/01/rdf-schema#"
xmlns:foaf = "http://xmlns.com/foaf/0.1/"
xmlns:po = "http://purl.org/ontology/po/"
xmlns:mo = "http://purl.org/ontology/mo/"
xmlns:skos = "http://www.w3.org/2008/05/skos#"
xmlns:time = "http://www.w3.org/2006/time#"
xmlns:dc = "http://purl.org/dc/elements/1.1/"
xmlns:dcterms = "http://purl.org/dc/terms/"
xmlns:wgs84_pos= "http://www.w3.org/2003/01/geo/wgs84_pos#"
xmlns:timeline = "http://purl.org/NET/c4dm/timeline.owl#"
xmlns:event = "http://purl.org/NET/c4dm/event.owl#">

<rdf:Description rdf:about="/programmes/b00l22n4.rdf">
<rdfs:label>Description of the episode Unnatural Love</rdfs:label>
<dcterms:created rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2009-06-02T00:14:09+01:00</dcterms:created>
<dcterms:modified rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2009-06-02T00:14:09+01:00</dcterms:modified>
<foaf:primaryTopic rdf:resource="/programmes/b00l22n4#programme"/>
</rdf:Description>

<po:Episode rdf:about="/programmes/b00l22n4#programme">
<dc:title>Unnatural Love</dc:title>
<po:short_synopsis>Jemaine accidentally goes home with an Australian girl he meets at a nightclub.</po:short_synopsis>
<po:medium_synopsis>Comedy series about two Kiwi folk musicians in New York. When Bret and Jemaine go out nightclubbing with Dave, Jemaine accidentally goes home with an Australian girl.</po:medium_synopsis>
<po:long_synopsis>When Bret and Jemaine go out nightclubbing with Dave, Jemaine accidentally goes home with an Australian girl. At first plagued by shame and self-doubt, he comes to care about her, much to Bret and Murray's annoyance. Can their love cross the racial divide?</po:long_synopsis>
<po:masterbrand rdf:resource="/bbcfour#service"/>
<po:position rdf:datatype="http://www.w3.org/2001/XMLSchema#int">5</po:position>
<po:genre rdf:resource="/programmes/genres/comedy/music#genre" />
<po:genre rdf:resource="/programmes/genres/comedy/sitcoms#genre" />
<po:version rdf:resource="/programmes/b00l22my#programme" />
</po:Episode>

<po:Series rdf:about="/programmes/b00kkptn#programme">
<po:episode rdf:resource="/programmes/b00l22n4#programme"/>
</po:Series>

<po:Brand rdf:about="/programmes/b00kkpq8#programme">
<po:episode rdf:resource="/programmes/b00l22n4#programme"/>
</po:Brand>
</rdf:RDF>

Quasi-facetted retrieval of images using emotions?

As part of my literature catch up I found an extremely interesting paper in JASIST by S. Schmidt and Wolfgang G. Stock entitled, 'Collective indexing of emotions in images : a study in emotional information retrieval'. The motivation behind the research is simple: images tend to elicit emotional responses in people. Is it therefore possible to capture these emotional responses and use them in image retrieval?

An interesting research question indeed, and Schmidt and Stock's study found that 'yes', it is possible to capture these emotional responses and use them. In brief, their research asked circa 800 users to tag a variety of public images from Flickr using their scroll-bar tagging system. This scroll-bar tagging system allowed users to tag images according to a series of specially selected emotional responses and to indicate the intensity of these emotions. Schmidt and Stock found that users tended to have favourite emotions and this can obviously differ between users; however, for a large proportion of images the consistency of emotion tagging is very high (i.e. a large proportion of users frequently experience the same emotional response to an image). It's a complex area of study and their paper is recommended reading precisely for this reason (capturing emotions anyone?!), but their conclusions suggest that:

"…it seems possible to apply collective image emotion tagging to image information systems and to present a new search option for basic emotions."

To what extent does the image above (by D Sharon Pruitt) make you feel happiness, anger, sadness, disgust or fear? It is early days, but the future application of such tools could find a place within the growing suite of image filters that many search engines have recently unveiled. For example, yesterday Keith Trickey was commenting on the fact that the image filters in Bing are better than Google or Yahoo!. True. There are more filters, and they seem to work better. In fact, they provide a species of quasi-taxonomical facets: (by) size, layout, color, style and people. It's hardly Ranganathan's PMEST, but – keeping in mind that no human intervention is required - it's a useful quasi-facet way of retrieving or filtering images, albeit flat.

An emotional facet, based on Schmidt and Stock's research, could easily be added to systems like Bing. In the medium term it is Yahoo! that will be more in a position to harness the potential of emotional tagging. They own Flickr and have recently incorporated the searching and filtering of Flickr images within Yahoo! Image Search. As Yahoo! are keen for us to use Image Search to find CC images for PowerPoint presentations, or to illustrate a blog, being able to filter by emotions would be a useful addition to the filtering arsenal.