Wednesday 24 December 2008

SKOS-ifying Knowledge Organisation Systems: a continuing contradiction for the Semantic Web

A few days ago Ed Summers announced on his blog that he was shutting down lcsh.info. For those that don't know, lcsh.info was a Semantic Web demonstrator developed by Ed with the expressed purpose of illustrating how the Library of Congress Subject Headings (LCSH) could be represented and its structure harnessed using Simple Knowledge Organisation Systems (SKOS). In particular, Ed was keen to explore issues pertaining to Linked Data and the representation of concepts using URIs. He even hoped that the URIs used would be Cool URIs, linking eventually to a bona fide LCSH service were one ever to be released. Sadly, it was not to be... The reasons remain unclear but were presumably related to IPR. As the lcsh.info blog entry notes, Ed was compelled to remove it by the Library of Congress itself. The fact that he was the LC's resident Semantic Web buff probably didn't help matters, I'm sure.

SKOS falls within my area of interest and is an initiative of the Semantic Web Deployment Working Group. In brief, SKOS is an application of RDF and RDFS and is a series of evolving specifications and standards used to support the use of knowledge organisation systems (KOS) (e.g. information retrieval thesauri, classification schemes, subject heading systems, taxonomies or any other controlled vocabulary) within the framework of the Semantic Web. The Semantic Web is many things of course; but it is predicated upon the assumption that there exists communities of practices willing and able to create the necessary structured data (generally applications of RDF) to make it work. This might be metadata, or it might be an ontology, or it might be a KOS represented in SKOS. The resulting data can then be re-used, integrated, interconnected, queried and is open. When large communities of practice fail to contribute, the model breaks down.

There is a sense in which the Semantic Web has been designed to bring out the schizophrenic tendencies within some quarters of the LIS community. Whilst the majority of our community has embraced SKOS (and other related specifications), can appreciate the potential and actively contributes to the evolution of the standards, there is a small coterie that flirts with the technology whilst simultaneously shirking at the thought of exposing hitherto proprietary data. It's the 'lock down' versus 'openness' contradiction again.

In a previous research post I was involved with the High-Level Thesaurus (HILT) research project and continue my involvement in an consultative capacity. HILT continues to research and develop a terminology web service providing M2M access to a plethora of terminological data, including terminology mappings. Such terminological data can be incorporated into local systems to improve local searching functionality. Improvements might include, say, implementing a dynamic hierarchical subject browsing tree, or incorporating interactive query expansion techniques as part of the search interface, for example. An important - and the original motivation behind HILT - is to develop a 'terminology mapping server' capable of ameliorating the "limited terminological interoperability afforded between the federation of repositories, digital libraries and information services comprising the UK Joint Information Systems Committee (JISC) Information Environment" (Macgregor et al., 2007), thus enabling accurate federated subject-based information retrieval. This is a blog so detail will be avoided for now; but, in essence, HILT is an attempt to provide a terminology server in a mash-up context using open standards. To make the terminological data as usable as possible and to expose it to the Semantic Web, the data is modelled using SKOS.

But what happens to HILT when/if it becomes an operational service? Will its terminological innards be ripped out by the custodians of terminologies because they no longer want their data exposed, or will the ethos of the model be undermined as service administrators permit only HE institutions or charitable organisations from accessing the data? This isn't a concern for HILT yet; but it is one I anticipated several years ago. And the sad experience of lcsh.info illustrates that it's a very real concern.

Digital libraries, repositories and other information services have to decide where they want to be. This is a crossroads within a much bigger picture. Do they want their much needed data put to a good use on the Web, as some are doing (e.g. AGROVOC, GEMET, UKAT)? Or do they want alternative approaches to supplant them entirely (i.e. LCSH)? What's it gonna be, punks???

No comments:

Post a Comment