Tuesday 7 June 2011

Reinventing the wheel as a square: schema.org

A few days ago the big three search engines (Google, Bing and Yahoo!) announced schema.org. Schema.org is a "collaborative" effort in the area of vocabularies for structured data on the Web and specifies nearly 300 mini-schema that can be used to provide semantics within XHTML. These mini-schema are based on the Microdata specification currently under review as part of the forthcoming HTML5 specification. What? It can be used to "provide semantics"? Don't we have ways of doing this within XHTML already, like RDFa and Microformats?!

Indeed we do...

Schema.org essentially proposes the use of Microdata instead of RDFa (and/or Microformats) and – although derived from the RDF data model – is simpler, less expressive and, as Manu Sporny notes, "exclusive". The announcement has caused a ruckus in the SW blogosphere, particularly from the co-chair of the W3C RDFa Working Group (Sporny) who has declared schema.org to be a "false choice". Even Yahoo!'s resident semantic search technology research guru, Peter Mika – who was part of the team that helped develop schema.org - acknowledges that RDFa would have been preferable because "I consider it more mature and a superior standard to Microdata in many ways". So why has Microdata and a suite of new vocabularies (the mini-schema) been proposed? This appears to be the question many people are asking. Myself included.

Although schema.org cite RDFa complexity and lack of adoption to be motivating factors behind their initiative, both are poor reasons and do not appear to be borne out by the evidence. RDFa can be as expressive as you like, and crucially, it can be just as simple as Microdata. Sporny provides a useful comparison of RDFa and Microdata modelling the same data, as does Gavin Carothers. And a 510% increase in RDFa usage during 2009-2010 does not tend to suggest slow adoption. On the contrary, I blogged about how utterly astonished I was at the uptake. (My early view is that schema.org appears to be motivated more by pure commercial considerations; this seems to be evident from perusing the available mini-schema, many of which are clearly designed to trigger richer results displays for the sale of particular products or services, and/or popular topics with clear commercial potential. SEO consultants are going to clean-up...)

But what probably disappoints most about schema.org is the lack of commitment to re-using existing vocabularies. Isn't that an important aspect of the Semantic Web? Re-use! Minimise duplication! Schema.org duplicates the work of established vocabularies (i.e. RDF Schema) such as FOAF, Dublin Core, the Music Ontology Specification, and many others, and often in a less expressive way. Why re-invent them? But this is part of a more general phenomenon. Rather than harness existing RDF standards that have benefitted from years of developer feedback, research and development, disparate use cases and, essentially, standards that have attempted to deliver what developers have asked for, the search engines have instead declared that they would prefer standards that work better for them. Their vision of structured data is one in which they control the direction of the Semantic Web and not the Semantic Web community, the W3C, or the Web community for that matter. The true impact of schema.org is therefore more philosophical than technical – and not in a good way.

So perhaps the instinctive technical reaction from 'Semantic gurus' is a little melodramatic. Schema.org will change the structured data landscape to be sure, but it is not in the same marketplace as vanilla RDF, doesn't even try to be, and is far less expressive than RDFa. Moreover, the search engines have announced their continued support for RDFa and Microformats - although no mixing of formats, please (!!!). Some, such as Mike Bergman, even see schema.org as a stepping stone for developers; fulfilling a different purpose and encouraging developers to move onto richer forms of structured data. And at least schema.org uses URIs, thus enabling some flexibility on how they are referenced in the future.

It is interesting to note that schema.org was announced a few days prior to the SemTech conference, which kicked-off yesterday in San Francisco. I wonder what the topic of conversation will be at the conference dinner? Well, we can look at Twitter for that...

1 comment:

  1. Further to my recent blog posting about schema.org, Michael Hausenblas and Richard Cyganiak from the Linked Data Research Centre at DERI have launched schema.RDFS.org. Incredibly, Michael and Richard have launched schema.RDFS.org (the website for which even looks like schema.org!) to provide mappings from schema.org’s circa 300 schema in RDFS. Incredible work! As they state on the schema.RDFS.org website: "When we first heard about Schema.org, we had a DERI-internal discussion on how to deal with this new development. Just an hour after Michael sent out the announcement, Giovanni was the first to suggest creating a mapping. In the morning of the next day we started to develop a scraper with great support from the ScraperWiki team. Richard then took this work further and finalised the scraper. Then, we set up this site with the support of John, who donated the sub-domain on rdfs.org. After less than 24h we provided an initial mapping of the Schema.org terms to RDF and announced it to the public." They provide a list of further work, including a Microdata-to-RDF gateway. Bravo chaps!

    ReplyDelete