Friday, 16 September 2011

Blog lifecycles

Understanding the lifecycle and the dynamics of blogs has been a topic of interest within computing and information science for many years.  Blogs exhibit peculiar social and temporal features thus making them a rich domain of study and, quite frankly, more interesting than static web pages.  Since this blog is almost four years old, now seems like an appropriate time to review the health of the ISG Blog.  It is not my intention to expose our blog to the kind of detailed analysis one would expect to find in the pages of JASIST; but let's look at some of the most basic numbers...

Now, in an ideal world, or a sensible one for that matter, one would be able to output a .csv file from Blogger which would contain a wealth of data on the number of blog posts, the hits these posts have attracted (per week and per month), number of comments, the identity of referring sites, etc, etc.  Alas, most of this data is unavailable, and any data that is available has to be generated manually making any serious analysis difficult.  Despite these obstacles I displayed sufficient stamina to manually generate some basic blog data and to describe it using the Dataset Publishing Language (DSLP) for running through the Google Public Data Explorer.  (There still remains some XML pain but I did it anyway...).  Data available pertains to the number of blog postings, their total hits (2007-2011), number of comments per blog post and the length of postings.  Data Explorer provides a good overview of the data but doesn't perform any statistics or analysis. I have therefore included some further data analysis below. Anyway, some of the headline figures are as follows:
  • 85 blog postings have been published since October 2007.
  • George Macgregor (i.e. me) was the most prolific blogger, accounting for 87% of all posts.  Johnny Read was next in line, producing 9.41% of all posts; Francis Muir, Jack OFarrell and Keith Trickey each contributed 1.18% of the total posts.
  • 2009 was the most productive year for the blog, with 33 posts being published, accounting for 38.82% of the blog's total posts.
  • The mean number of page views was 29 per blog post (M = 29; SD = 90; IQR = 18).
  • On average, 0.8 reader comments were made in response to the blog postings (M = 0.8; SD = 1.23; IQR = 1).
  • The most read post was this one from October 2009, attracting 751 page views.
Let's look at the last headline figure first.

Figure 1: ISG Blog hits (2007-2011) by author, as viewed in Google Public Data Explorer. 
Blogger provides summary data on blog posting page views, or "hits" if you prefer.  I extracted these manually to get a measure of post impact.  An average of 29 page views is disappointing and – as you can see from the Data Explorer – although there are some traffic spikes which account for the high data dispersion (i.e. SD = 90; IQR = 18), some of the individual page view figures are very low.  However, we must remember two important caveats:  Google Analytics (used to compile the Blogger data) uses a rigid definition of page views in order to flush out transient visitors.  Secondly, many – and perhaps the majority of those dedicated to reading the ISG Blog – will read postings using an RSS reader.  Unfortunately, even Google Analytics can't capture data on consumption made via RSS.  It is therefore safe to assume that these figures grossly underestimate the number of ISG Blog readers.  With this in mind, the top ten most read postings were as follows:
  1. Blackboard on the shopping list (751 page views)
  2. The Kindle according to Cellan-Jones (301 page views)
  3. Some general musing on tag clouds, resource discovery and pointless widgets (235 page views)
  4. Crowd-sourcing facetted information retrieval (103 page views)
  5. Web Teaching Day – 6 Sep 2010 (74 page views)
  6. How much software is there in Liverpool and is it enough to keep me interested? (67 page views)
  7. Trough of disillusionment for microblogging and social software (56 page views)
  8. Jimmy Reid and the public library: an education like no other (52 page views)
  9. Goulash all round: Linked Data at the NSZL (50 page views)
  10. Shout "Yahoo!": more use of metadata and the Semantic Web (46 page views)
Rather surprisingly – but disappointingly given the extra time they take to compose - the top ten most read blog postings tend not to be the longer, more intellectually considered contributions; but the more ephemeral ones.  This is clear from the #1 most read posting, which was merely a brief comment on a blogosphere rumour that Google might acquire Blackboard.  This post evidently fed into the social and temporal characteristics that can typify blogs and must be considered – using the more up-to-date jargon of the Twitterati – a "trending" topic.  It attracted the highest number of page views (751) and comments (9), and to date remains popular (according to some extra data that I have...).  In fact, using Gruhl et al.'s macroscopic blog characteristics typology, this posting could be considered "Mostly Chatter".  "Mostly Chatter" postings are those that attract attention or discussion at moderate levels throughout the entire period of analysis.  The majority of other postings fall within Gruhl et al.'s "Just Spike" category, i.e. they are postings that become active but then suddenly become inactive and demonstrate a very low level of chatter.  This appears to be corroborated by the generally low page view figures for most posts and the average comment figures (M = 0.8; SD = 1.23; IQR = 1).

Figure 2: Comments per ISG Blog post (2007-2011).
It is also interesting to note that although Francis Muir only made one blog post during the lifetime of the blog his post features in the top five most read contributions (74 page views).  Again, this is perhaps because it was a bursty topic and was trending at the time of publication.  It is nevertheless reassuring that at least some of the more intellectually considered contributions feature in the top ten (e.g. 3, 6 and 8).  On average though, the rest of us attracted fewer eyeballs.  For example, George Macgregor (M = 31; SD = 96; IQR = 18); Johnny Read (M = 14; SD = 31; IQR = 25).

Figure 3 provides an overview of blog post length. As a frequent author of the longest blog posts I have always been worried that I might be boring readers to death (5 posts > 1,000 words).  I always felt longer posts were necessary to cover our intellectually stimulating topics.  Yet, as it transpires, my average post was shorter than expected (M = 534; SD = 306; IQR = 355), and was actually shorter than Johnny Read's average (M = 668; SD = 154; IQR = 106).  I know, I know...  My SD and IQR are far higher, but let's not focus on that because, on the face of it, Johnny would appear to be more boring than I am! ;-)

Figure 3: Post length on the ISG Blog by author (2007-2011).
Which leads to the topic that started all this: ISG Blog health, or the blog lifecycle if you prefer.  What is the current state of health of the ISG Blog?  We noted that 2009 was the most productive year for the blog.  This can be easily observed from the graphs, most of which reveal a busy profile during 2009.  But according to the graph on total posts (Figure 4), the data reveals a spike in 2009, with a comparable number of contributions in 2010 and 2008, and a similar pattern in 2011 and 2007.  In other words, the trend in 2011 seems to be for decline and perhaps even death. 

Figure 4: Total post per year by author (2007-2011).
Researchers have been keen to model blog failure for many years.  For example, Qazvinian et al.'s research (presented at the International AAAI Conference on Weblogs and Social Media) identifies blogs that are prone to "connection failure" and "commitment failure".  As the names of these phenomena suggest, connection failure is a blog that fails to enjoy the network effect within the blogosphere, either because other blogs are not commenting or linking to that blog, or because the readers are not engaged enough to comment on postings.  Commitment failures are more difficult to interpret from Qasvinian’s data; however, their data clearly indicates that new bloggers (of circa one month) typically account for 80% of all blog failures (i.e. quits) within any given time window.  The most dangerous time in which the ISG Blog could succumb to commitment failure has therefore been and gone.  But despite making it past the one month mark by almost four years, the ISG Blog has clearly past its prime.  I made half as many posts in 2010 as I did in 2009, and I have thus far made fewer than half my 2010 contributions in 2011.  A similar trend can be observed in the number of Johnny Read's posts too.  The only tenuous consolation is that as time has gone by my average blog length appears to have increased.  However, although this appears to be borne out the scatterplot (Figure 5 - yup, Data Explorer can't do scatterplots or trendlines) in which a upwards linear regression trendline can be observed, it isn't borne out by the associated numbers ( = 0.0442). 
Figure 5: ISG Blog post length for George Macgregor (2007-2011), with linear regression line.
It is no surprise that my diagnosis is that the ISG Blog suffers a mixture of connection and commitment failure, and that my departure at the end of September could be the final nail in the ISG Blog coffin.  The question is can someone administer CPR after I depart to save it from near certain death?

No comments:

Post a Comment