Friday 28 November 2008

Catching up with the old future of databases

We have been discussing in the group what we should be teaching on our Business Information Systems undergraduate course. Which is having a bit of a revamp. One area of discussion is about the areas of 'databases' which we teach mostly in the students second year and 'object oriented analysis and design' (but mostly analysis) which we teach in their final year.

When I used to teach database sections 10 years ago to business students we used to teach a history of:-
  1. file access
  2. hierarchical databases
  3. network databases
  4. relational databases
  5. object oriented databases.
Of course Object Oriented Databases hadn't happened in a big way back then. The surprising thing is that they haven't happened in a big way even now, they have spent 10 years being the next big thing. Meanwhile their close relatives UML based analysis and object oriented design and development have swept in from all directions. Strange that my notes from 10 years back now look like 'Space 1999', In their predictive powers. In my day job at Village we had a book on the subject 10 years back but a quick check of the book shelf shows we have long since recycled it.

Apparently believers and developers of Object Oriented Databases are just bemused by why everybody hasn't followed them into the promised land, particularly as much time is spent on 'Object Relational Mapping' technologies.

All round software development and architecture thinker and general purpose bearded Guru Martin Fowler, believes that the issue isn't to do with the general capabilities of the Object Oriented Databases but rather the fact that much integration in corporations occurs in the data layer not the business layer hence systems are dependent on standardised SQL approaches to integration. He suggests in his blog post on the subject that this shared database integration requirement has been holding back the march to the future of Object Oriented Databases. Creating extra inertia. Indeed, a confession, in my day job despite being Object Oriented N-Tier architecture developers by trade and conviction, when it came to tying our own timesheet system to our task management system we used database level triggers. It's a bit like the fact that there are better ways to do typing that using qwerty but we've all learnt to live with qwerty.

However with the movement towards using Web Services and SOA type architectures in effect making XML the linguq franca rather than SQL, Martin Fowler suggests that the field might start to loosen up. Although I wonder whether reporting is another issue. We produce some reports (in Crystal Reports and the equivalent) straight from our business objects, but other management reports really need to be produced straight off the (SQL) database. On occasions this is seperate from the main thrust of the application using a different technology stack.

Really the technology of the day in software development is Object Relational Mapping tools, ORMS. These try and hold the Object Oriented businesss layer to the Data Entity Oriented database layer. Such connections are relatively straightfoward in an unsophisticated design. My final year students currently angsting over a UML assignment will find that their Class Diagram is much the same as their Entity Relationship Diagram. But as you move deeper into doing things the Object way the two diverge. My Village colleague Ian Bufton and I have been discussing this in terms of lining up the two layers using either tools or code generation you can see some of his initial ponderings on his blog.

Luckily these types of contemplation are outside the scope of the things that our Information Systems students at the Business School, no doubt they have to worry about how to teach it at the Computer Science department.

Wednesday 26 November 2008

'Gluing' searches with Yahoo!: part three in the search engine trilogy

There is plenty to comment on in the world of search engines at the moment. This post signifies the last in a series of discussions regarding search engines developments (or lack of!). (Part I; Part II)

As Google SearchWiki was unveiled, Yahoo! announced the wider release of Yahoo! Glue. Originally developed and tested at Yahoo! India, Yahoo! Glue now has a wider release – although it remains (perpetually?) in 'beta'. Glue is an attempt to aggregate disparate forms of information on a single results page in response to a single query. I suppose Glue is a functional demonstration of the ultimate mashup information retrieval tool. Glue assembles heterogeneous information from all over the Web, including text, news feeds, images, video, audio, etc. Search for the Beatles and you will get a results page listing Wikipedia definitions, LastFM tracks for your listening pleasure, news feeds, YouTube videos, etc.

In an ironic twist, Glue appears to be defying the dynamic ethos of Web 2.0. Glue searches are not created on the fly and only a limited number Glue searches are available at the moment (for example, no Liverpool!). Glue has the 'beta forever' mantra as the 'Get out of jail free card', of course. Still, Yahoo! informs us that:
"These pages are built using an algorithm that automatically places the most relevant modules on a page, giving you a visually rich, diverse page all about the topic in which you're interested."
Glue is also an example of Yahoo! exploring the social web in retrieval, harnessing as it does users' opinions on the accuracy of this algorithm (e.g. irrelevant or poorly ranked results can be 'flagged' as inappropriate or irrelevant).

Glue is - and will be - for the leisure user; the person falling into the 'popular search' category in search engines. These are the users submitting the simplest queries. The teenagers searching for 'Britney Spears', and the adults searching for 'Barack Obama' or 'Strictly Come Dancing'. The serious user (e.g. student, academic, knowledge worker, etc.) need not apply. I also have reservations over whether the summarisation of results is appropriate, and whether Glue can actually assemble disparate resources that are all relevant to a query. Check out these canned searches for Stephen Stills and Glasgow. In what way is Amsterdam a relation to Glasgow? And why the spurious news stories for Stephen Stills? Examining the text it is clear how it has been retrieved; but why when similar issues do not affect Yahoo! Search?

Like Google with SearchWiki, Yahoo! emphasise that Glue is not a replacements for Yahoo! Search; rather it's a "standalone experience":
"… Yahoo! Glue(TM) beta is not to replace the Yahoo! Search experience [...] We're always challenging ourselves to explore innovative new ways to deliver great experiences. Glue is one of those experiments, with a goal of giving users one more visual way to browse and discover new things from across the Web. We'll be working to expand the number of Glue pages, improve the experience and incorporate your feedback into future versions."
Very good. But making this dynamic and scalable should be atop the Glue 'to do' list. No Liverpool!

Monday 24 November 2008

Wikifying search

This blog follows a series of other blogs pontificating about the efficacy of search engines in information retrieval. Over the weekend Google announced the release of Google SearchWiki. Google SearchWiki essentially allows users to customise searches by re-ranking, deleting, adding, and commenting on their results. This is personalised searching (see video below). As the Official Google Blog notes:
"With just a single click you can move the results you like to the top or add a new site. You can also write notes attached to a particular site and remove results that you don't feel belong."
The advantages of this are a little unclear at first; however, things become clearer when we learn that such changes can only be affected if you have an iGoogle account. Google have – quite understandably – been very specific about this aspect of SearchWiki. Search is their bread and butter; messing with the formula would be like dancing with the devil!

Google SearchWiki doesn't do anything further to address our Anomalous State of Knowledge (ASK), nor can I see myself using it, but it is an indication that Google is interested in better exploring the potential of social data to improve relevance feedback. Google will, of course, harvest vast amounts of data pertaining to users' information seeking behaviour which can then be channelled into improving their bread and butter. (And from my perspective, I would be interested to know how they analyse such data and affect changes in their PageRank algorithm). Their move also resonates with an increasing trend to support users in their Personal Information Management (PIM); to assist users in re-finding information they have previously located, or those frequently conducting the same searches over and over. It particularly reminds me of research undertaken by Bruce et al. (2004). For example, users increasingly chose not to bookmark a useful or interesting web page, but simply find it again – because they know they can. If you continually encounter information that is irrelevant to your area, re-rank it accordingly - so the SearchWiki ethos goes...

Perusing recent blogs it is clear that some consider this development to have business motivations. Technology guru and Wired magazine founder, John Battelle, thinks SearchWiki is an attempt to attract more users of iGoogle (which at the moment is small), whilst simultaneously rendering iGoogle the centre of users’ personal web universe. To my mind Google is always about business. PageRank is a great free-text searching tool, thus permitting huge market penetration. SearchWiki is simply another business tool which happens to offer some (vaguely?) useful functionality.