Wednesday 27 May 2009

Image searching with Creative Commons

Student information literacy skills have been discussed on the blog before. In short, they are woeful. One area where students tend to have little understanding is in the area of intellectual property rights (IPR). The situation might be looking better for digital music, but in my experience it remains poor for other digital artefacts, particularly images. 'Twas only a few weeks ago while was in a lab with some undergraduate students for a web technologies module when I discovered most of them were ripping images from the web for inclusion within their information gateways. While this can (in some circumstances) be tolerated within the confines of an educational institution, it remains copyright infringement owing to copying by 'reprographic means' - and this isn't behaviour we want to become habitual in our graduates. My brother (a graphic designer and new media guru) has spun me many a yarn about ex-colleagues who have been shown their P45 for engaging in IPR theft (e.g. reusing someone's basic design or photograph).

All of this is veering away from the original reason for this blog though, which is to draw attention to some new image searching functionality on Yahoo! Image Search. Following on nicely from the Search Options post, the Yahoo! Search Blog has just announced the inclusion of some extra search filters for image result sets. Not only is it better than Google (and more accurate?), but it also includes a useful Creative Commons (CC) filter. Using a similar interface to Yahoo! Search Assist, Yahoo! Image Search allows users to apply a CC checkbox to filter for images, with specific filters included for commercial reuse and/or remixing. This is particularly useful to embellish those PowerPoint presentations or to illustrate a blog, or for those undergraduate students building an information gateway, or to avoid getting a P45!

There appears to be a downside, unfortunately. When I saw the Yahoo! Search Blog announcement I thought (perhaps naively) that Yahoo! was starting to put into practice its commitment to metadata, Semantic Web specifications, and other structured data. Since I know my personal homepage is indexed by Yahoo! and uses XHTML+RDFa to notify intelligent agents that its page content falls under a Creative Commons Attribution 3.0 License, I thought I'd put an Image Search to the test. Providing the CC namespace is referenced, the XHTML+RDFa required is simple. For example:

<p>Content on <a href="http://www.staff.ljmu.ac.uk/bsngmacg/" property="cc:attributionName" rel="cc:attributionURL">George Macgregor</a>'s website is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by/3.0/">Creative Commons Attribution 3.0 License</a></p>

...and with specific CC reference to my foaf:depiction...

<img src="img/georgedepiction.jpg" alt="Image of George Macgregor" rel="license" href="http://creativecommons.org/licenses/by/3.0/" property="foaf:depiction" content="George Macgregor"/>

My filtered CC search was unsuccessful though. This disappointed me; but then I observed the following notice:
"Note: Only Flickr images are supported currently."
Flickr – which is a subsidiary of Yahoo! – has allowed users to conduct advanced searches of its publicly uploaded images for quite some time. This has included CC searching. And it would appear that Yahoo! has integrated Flickr searching functionality into Image Search, albeit with some nice tweaks. If I had read their blog in its entirety I would have realised this; I just clicked the link such was my excitement about Yahoo! Image Search!

It's useful to have this functionality within a conventional searching tool, but it is disappointing that Image Search isn't using cleverer means of doing it (e.g. RDFa) rather than relying on the preferences of Flickr users when they upload their images. Don't get me wrong, this is useful and most welcome, and it will save me time on occasion, but it would be exciting to crack CC image searching beyond the controlled Flickr environment. Hopefully the 'currently' in "Only Flickr images are supported currently" will mean that my expectations will be met soon…

Monday 18 May 2009

Light relief: Celtic fringes erased by WolframAlpha?!

With WolframAlpha launched on Friday, I spent much of my weekend trying to get a 'computable request' to compute. Not until Monday morning did a request compute – but its performance has been getting better ever since so hopefully we will all have more time to experiment with it over coming days and weeks...

Like me, Gwenda Mynott has been testing WolframAlpha and has been searching for things that, a) you have a good knowledge of, and, b) a topic that WolframAlpha can easily compute. Places are good for this (e.g. countries, towns, cities, etc.), and Stephen Wolfram computes multiple locations to good effect in his demonstrations; however, Gwenda tried to 'compute' Wales and arrived at some bizarre results. Check them out. WolframAlpha doesn't retrieve data pertaining to the constituent nation of the United Kingdom of Great Britain and Northern Ireland (i.e. Wales as you or I would tend to know it!), but a small town in South Yorkshire by the name of Wales (?) The only other obvious option WolframAlpha provides is Wales (New York, USA), which is equally amiss.

Hmmmmm. If this is the result for Wales, what are the results for the rest of the UK? Well, that's equally controversial. England appears to be synonymous with the United Kingdom of Great Britain and Northern Ireland. Scotland is referred by WolframAlpha back to the Kingdom of Scotland, which ceased to exist after the Act of Union in 1707. Worse than that, Northern Ireland doesn't even exist! ("WolframAlpha isn't sure what to do with your input") Cornish nationalists will also be dismayed to learn that Cornwall (Canada) is the only one that counts.

Is this a systematic attempt to erase the history, culture and memory of the Celtic fringes?! Of course not. The results might be strange, but from a knowledge engine point of view – and ontologically speaking - Wales, Scotland and Northern Ireland are subsumed by the larger geographical and political entity of the UK, so it's understandable that WolframAlpha computes the answer in this way. Still, the England/UK synonymy is a bit odd and must have been encoded by someone somewhere sometime!

Experiment away, folks - and I would encourage everyone to post their most bizarre / illogical data results as comments to this blog. A prize will go to the most outlandish!

Friday 15 May 2009

Some more 'Search Options'...

I promised not to blog about Google any time soon for fear the blog becomes known as the unofficial Google blog. After some consideration I thought, 'pish posh!' Anyway, the post has a wider remit than just Google...honest!

The absence of retrieval aids for Google users (oh no, not again - I hear you cry!) has been discussed at great length on this blog before. To appreciate the extent of this deficiency we need only peruse some innovative rival search engines such as Ask (recently re-branded back to Ask Jeeves), Yahoo!, or Clusty. Google has been making changes though and today the Official Google Blog announced some further enhancements to the universal Google search interface. Simply called, 'Search Options', these tools let you "slice and dice" results, apply rudimentary filters, and generate alternative views of results. Search Options does a little bit more to help the user in query formulation (the area where I think Google is weakest), but also offers some useful functionality once you have your results.



Check out our usual canned search for 'communism in Russia'; 'click' the 'Search options…' link in the top left had corner of the interface to reveal the Search Option tools.

Filters are available for videos, forums and reviews (the latter being fairly useful if you are shopping). Various publication time filters are also available. Nothing here is particularly mind blowing though.

Search Options gets a bit more interesting when the search display options are explored in a little detail. Firstly, it's possible to request details of related searches. These are displayed in a better page location than before and look similar to Yahoo! Search Assist. But it is now also possible to select the 'Wonder Wheel' which generates a visualisation of the related terms. I'm unsure how useful the Wonder Wheel really is, particularly as the true nature of the relationships between terms is impossible for Google to represent other than in syntactic terms; this is something the Semantic Web community is obviously trying to resolve.

Most interesting though is the 'Timeline' tool. This allows results to be displayed along, erm, yup, a timeline. The timeline is clickable allowing the user to drill down into particular temporal zones and to view resources relating to that zone. I use the word 'interesting' because although the timeline is probably quite useful for historical research, its moment of introduction is the most interesting part. Indeed, the timeline functionality looks in part like Google is bracing itself for the release of WolframAlpha, which is due any day now (or tonight?) – and I wouldn't be at all surprised if this announcement was an attempt to steal some of its thunder. This appears to have been combined with the demonstration of Google Squared at the Google Searchology conference a few days ago. No Google Squared prototypes appear to be available for us to experiment with, but TechCrunch got a sneaky peak at Searchology (view the YouTube video below). Google Squared is, in essence, Google's answer to WolframAlpha.

For me the most interesting news to emerge alongside Search Options is Google's desire to make greater use of RDFa. RDFa is probably a little pedestrian for me, but it's better than nothing – and at least there is a clear intention of using some Semantic Web specifications. It's just a shame Yahoo! announced something similar but more radical almost 18 months ago.

Friday 1 May 2009

LCSH as Linked Data ... officially!

Yesterday was, in my estimation, pretty historic. The Library of Congress officially launched the LC Authorities and Vocabularies service. You might recall a previous post relating to lcsh.info in which I lamented the LC's decision to pull down a SKOS demonstrator of LCSH, explicitly designed to explore the possibilities of Linked Data and dereferenceable URIs. All the background is in the previous post; but the whole episode appears to have been a PR disaster for LC.

The great news is that the LC Authorities and Vocabularies service (let's call it LCAV henceforth, shall we?) officially re-launched lcsh.info in a bigger, better and much improved form. The service essentially enables both humans and machines to access a plethora of LC authority data. Like lcsh.info, the service employs Semantic Web approaches to exposing this data and implements approaches to Linked Data by exposing and linking data on the Web via dereferenceable URIs.

Five minutes exploring the website reveals that LCAV serves up the entire LCSH for free, with incredible search and browse functionality, leaving Connexion in the shade. The concept URIs point to detailed data modelled in SKOS as RDFa for human readability, but with links to SKOS as RDF/XML, N-Triples and (the less familiar?) JSON for machine processing. RDF graphs can even be visualised by clicking, well, the 'visualize' tab – incredible. Mappings to other vocabularies are also provided.
On top of all this, LCSH can be downloaded in its entirety as RDF/XML or N-Triples (SKOS)! LCAV also indicate that further authority data will be made available soon.

Make no bones about it, this is historic stuff, not only because the service is so good but because this terminological data is no longer locked down. I think it's important to stroke our imaginary beards over the significance of the LC's change of direction. Is this the beginning of the end for locked down terminological data?! Will they be like dominoes henceforth? A fiver says DDC does the same by the end of the year. Any takers???