Showing posts with label LCSH. Show all posts
Showing posts with label LCSH. Show all posts

Friday, 1 May 2009

LCSH as Linked Data ... officially!

Yesterday was, in my estimation, pretty historic. The Library of Congress officially launched the LC Authorities and Vocabularies service. You might recall a previous post relating to lcsh.info in which I lamented the LC's decision to pull down a SKOS demonstrator of LCSH, explicitly designed to explore the possibilities of Linked Data and dereferenceable URIs. All the background is in the previous post; but the whole episode appears to have been a PR disaster for LC.

The great news is that the LC Authorities and Vocabularies service (let's call it LCAV henceforth, shall we?) officially re-launched lcsh.info in a bigger, better and much improved form. The service essentially enables both humans and machines to access a plethora of LC authority data. Like lcsh.info, the service employs Semantic Web approaches to exposing this data and implements approaches to Linked Data by exposing and linking data on the Web via dereferenceable URIs.

Five minutes exploring the website reveals that LCAV serves up the entire LCSH for free, with incredible search and browse functionality, leaving Connexion in the shade. The concept URIs point to detailed data modelled in SKOS as RDFa for human readability, but with links to SKOS as RDF/XML, N-Triples and (the less familiar?) JSON for machine processing. RDF graphs can even be visualised by clicking, well, the 'visualize' tab – incredible. Mappings to other vocabularies are also provided.
On top of all this, LCSH can be downloaded in its entirety as RDF/XML or N-Triples (SKOS)! LCAV also indicate that further authority data will be made available soon.

Make no bones about it, this is historic stuff, not only because the service is so good but because this terminological data is no longer locked down. I think it's important to stroke our imaginary beards over the significance of the LC's change of direction. Is this the beginning of the end for locked down terminological data?! Will they be like dominoes henceforth? A fiver says DDC does the same by the end of the year. Any takers???

Wednesday, 24 December 2008

SKOS-ifying Knowledge Organisation Systems: a continuing contradiction for the Semantic Web

A few days ago Ed Summers announced on his blog that he was shutting down lcsh.info. For those that don't know, lcsh.info was a Semantic Web demonstrator developed by Ed with the expressed purpose of illustrating how the Library of Congress Subject Headings (LCSH) could be represented and its structure harnessed using Simple Knowledge Organisation Systems (SKOS). In particular, Ed was keen to explore issues pertaining to Linked Data and the representation of concepts using URIs. He even hoped that the URIs used would be Cool URIs, linking eventually to a bona fide LCSH service were one ever to be released. Sadly, it was not to be... The reasons remain unclear but were presumably related to IPR. As the lcsh.info blog entry notes, Ed was compelled to remove it by the Library of Congress itself. The fact that he was the LC's resident Semantic Web buff probably didn't help matters, I'm sure.

SKOS falls within my area of interest and is an initiative of the Semantic Web Deployment Working Group. In brief, SKOS is an application of RDF and RDFS and is a series of evolving specifications and standards used to support the use of knowledge organisation systems (KOS) (e.g. information retrieval thesauri, classification schemes, subject heading systems, taxonomies or any other controlled vocabulary) within the framework of the Semantic Web. The Semantic Web is many things of course; but it is predicated upon the assumption that there exists communities of practices willing and able to create the necessary structured data (generally applications of RDF) to make it work. This might be metadata, or it might be an ontology, or it might be a KOS represented in SKOS. The resulting data can then be re-used, integrated, interconnected, queried and is open. When large communities of practice fail to contribute, the model breaks down.

There is a sense in which the Semantic Web has been designed to bring out the schizophrenic tendencies within some quarters of the LIS community. Whilst the majority of our community has embraced SKOS (and other related specifications), can appreciate the potential and actively contributes to the evolution of the standards, there is a small coterie that flirts with the technology whilst simultaneously shirking at the thought of exposing hitherto proprietary data. It's the 'lock down' versus 'openness' contradiction again.

In a previous research post I was involved with the High-Level Thesaurus (HILT) research project and continue my involvement in an consultative capacity. HILT continues to research and develop a terminology web service providing M2M access to a plethora of terminological data, including terminology mappings. Such terminological data can be incorporated into local systems to improve local searching functionality. Improvements might include, say, implementing a dynamic hierarchical subject browsing tree, or incorporating interactive query expansion techniques as part of the search interface, for example. An important - and the original motivation behind HILT - is to develop a 'terminology mapping server' capable of ameliorating the "limited terminological interoperability afforded between the federation of repositories, digital libraries and information services comprising the UK Joint Information Systems Committee (JISC) Information Environment" (Macgregor et al., 2007), thus enabling accurate federated subject-based information retrieval. This is a blog so detail will be avoided for now; but, in essence, HILT is an attempt to provide a terminology server in a mash-up context using open standards. To make the terminological data as usable as possible and to expose it to the Semantic Web, the data is modelled using SKOS.

But what happens to HILT when/if it becomes an operational service? Will its terminological innards be ripped out by the custodians of terminologies because they no longer want their data exposed, or will the ethos of the model be undermined as service administrators permit only HE institutions or charitable organisations from accessing the data? This isn't a concern for HILT yet; but it is one I anticipated several years ago. And the sad experience of lcsh.info illustrates that it's a very real concern.

Digital libraries, repositories and other information services have to decide where they want to be. This is a crossroads within a much bigger picture. Do they want their much needed data put to a good use on the Web, as some are doing (e.g. AGROVOC, GEMET, UKAT)? Or do they want alternative approaches to supplant them entirely (i.e. LCSH)? What's it gonna be, punks???

Monday, 26 November 2007

All the way from America...

Research is a much abused term. If you ask undergraduate students they will confidently describe a Google based “bash in a couple of terms and hit the return key” as research and subsequently suffer from the delusion that that is all research is. What I have been engaged in for the last week I think could be defined as a “fishing trip”. This is a research approach from the “old days” before the whole world was claimed as available online.

When you were opening a new major area of research you would take yourself off to a monster library (The British Library at Boston Spa was ideal – due to the immense journal collection it possessed) and using printed abstracts and indexes would slowly wade back through the last ten or twenty years of “stuff” as appropriate. At the end of the exercise you would have reasonable confidence that you had covered the field in detail. The subsequent reading of the literature gathered would allow you to patch what gaps there were. As my LCSH topic predates the standard abstracting and indexing services, this older approach was required.

So ensconced on the 5th floor of the Library of Congress Adams building I worked my way through sixty years worth of Library journal about thirty volumes of the Bulletin of the American Library Association and about ten years of the Catalogers’ and classifiers’ yearbook. The most recent volumes consulted were 1940. I would regularly branching off to pick up specialist subject heading lists or contemporary textbooks as I moved forward.

The result of this process can be evaluated in at least two ways. A simple measure of the thickness of the stack of photocopying to be brought back evidences (in a real sense) the extent of the information capture. The other measure came as a surprise to me, it just kind of sneaked up on me as the process developed. My confidence in my knowledge of the topic strengthened as the week proceeded. The previous slow and laborious accumulation of material of the last two years had not inspired my confidence (I was painfully aware of gaps in the process – even though I did not really know what the gaps were!). Having dug in and worked my way through the major sources of information my doubts as to how to proceed have cleared and the next stage in the process seems quite straightforward (at the level of ideas!).

The total luxury of having a whole week to dedicate to nothing else except the research has been massively helpful. I have waked, washed, ate, walked, worked and slept the research. This has allowed effective thinking to occur as those thousand and one well intentioned interruptions that plague my working and home life were simply turned off – along with the mobile phone.

Today is Thanksgiving – the massive American family festival, everything is closed – even the food facility in the hotel – just a continental breakfast – I have to eat out tonight – if I can find somewhere. So the day has been spent sorting my document harvest so I know what I need to copy in my Friday morning visit to the Library.

The real task begins when I get back to Liverpool as I attempt to convert this short sprint in Washington into the steady paced marathon that is required to deliver this research as an academic thesis.