On January 22nd 2014 two academic researchers released a paper titled ‘Epidemiological modeling of online social network dynamics’. The two researchers were John Cannarella and Joshua Spechler, both of Princeton University.
In this study they looked at how you could use models from the spread of infectious diseases to model the rise and fall of Online Social Networks. They used raw data from Google Trends to ascertain levels of popularity of an OSN and used MySpace as historical evidence of accuracy.
The decline of Facebook – Google trends
Their findings? The decline of Facebook, which, according to them, will be no longer be around by 2017. This was based on best-fit predictions due to an apparent decrease in popularity and interest in Facebook, which was based on search volume declining on Google.
Search interest over time for ‘Facebook’ as taken from Google Trends (24th Jan 2014)
Facebook strike back at Princeton
In response, Facebook’s data team have delivered a rather amusing analysis of how Princeton University will no longer have any students by 2021. This was worked out by using the same core principals that was used by Messrs. Cannarella and Spechler in their Facebook study- a combination of correlation and extrapolation of various Google Trends data.
The danger of correlation
Whilst the above makes for some entertaining reading, and I recommend skimming through both papers, there are serious undertones. Firstly the paper by Mr. Cannarella and Mr. Spechler could have financial impact on Facebook if current shareholders take it at face value. As with all publicly traded companies, media coverage can and will continue to effect share price. As of the end of trading yesterday shares of Facebook had dropped by 1.53% to $56.63. However after hours trading may well buck this small decrease, so it looks like this research may have little impact on Facebook’s value.
The serious point to this episode, however, is that correlation, in isolation, can mean nothing and must not be taken at face value. Correlation coefficient is measured by a P value that sits between -1 and 1, the closer to 1 the stronger the positive correlation and the closer to -1 means the stronger the negative correlation. There are an abundance of examples however where P values indicate strong correlations but are clearly 2 sets of meaningless data that have no relationship in real life. Coupled with that are the dangers of extrapolating data sets. To highlight the issues with both these methods I have dug through my bookmarks to find a few statistic ‘jokes’ that are always worth keeping at front of mind when looking at any data analysis:
The dangers of extrapolation!
Who knew Internet Explorer was such a killer
Clearly increase in Vitamin C consumption must decrease accidents
What can you learn from the Princeton and Facebook study?
Hopefully this has served as a friendly reminder on how statistics can and will continue to be taken out of context. We spend a lot of time ensuring that we fully understand the context of our clients’ business when developing and measuring digital strategies to ensure we have a clear understanding of what we need to measure, the caveats of the measurement tools and the context within which the data can be interpreted. Without a full comprehension you run the risk of following data blindly and potentially shifting resources and strategies in the wrong direction based on faulty assumptions.