Why are power laws still in the news? Michael Stumpf and Mason Porter recently published a commentary on the topic in Science (“Critical Truth About Power Laws“). Yet the nature of long tailed distributions has been a debate in physics since the work on critical phenomena in the 70’s. This then continued in the context of fractals in the 80s, self-organised criticality in the 90’s and now in terms of complex networks. So the issues should be familiar to theoretical physicists and applied mathematicians working in these areas. However the field of complexity is multidisciplinary and there is a real need to reach out to talk to those researchers from other backgrounds to explain about what we have learnt in all these debates about power laws. Often they have their own experiences to add to this debate – financial mathematicians have long been interested in long tailed distributions. Physicists may understand the subtext behind attention grabbing headlines but there is no reason to think others know how large a pinch of salt to add to these claims. That has certainly been my impression when hearing experts from other fields refer to some of early claims about complex networks. Perhaps the most worrying aspect is that many of the points raised in this age old debate are still not addressed in many modern publications. This is part of the frustration I hear when reading the Stumpf and Porter article. To me this is a good example of the repeated failure in the quality control derived from the current referring system (but that is a debate for another time).
One of the key points made by Stumpf and Porter, though I’ve heard this many times before, is the lack of statistical support behind many claims for a power law. Identifying a power law behaviour is not trivial. I always remember that Kim Christensen at Imperial recommended to me that four decades of data were needed following his experiences with power laws while Stumpf and Porter suggest two. Many examples fail to pass even this lower limit.
Aaron Clauset, Cosma Shalizi and Mark Newman provide one popular statistical recipe and code that addresses most issues ( “Power-law distributions in empirical data” – and check out the blogs on the arXiv trackback). The approach will produce much more useful results than I’ve seen in several published papers. It might not be perfect though. I think I’ve seen warnings against the use of cumulative distributions as problems in the early part of the data will effect many more data points that issues in later data points.
In fact I would go further. I often reject papers with no error estimates in measured quantities, no uncertainties in data points – I mark our own physics students down for these failings. Why show points and a best fit on a log-log plot except as an illustration? A plot of residuals (differences between data and fit) is far more informative scientifically. Many refereed papers appear to let these things go limiting the impact of their message.
Another part of the debate and a key message in Stumpf and Porter was the meaning of a power law. Most researchers I know realised any hope of universality in the powers seen in network data was misplaced in the early naughties. For me this was one of my first questions and as I did not see it answered in the literature it led me to think about the formation of scale-free networks in the real world. I wrote up as a paper with Jari Sarämaki from Finland (“Realistic models for the formation of scale-free networks“). Existing models just didn’t explain this at the time, precisely what Stumpf and Porter say is still missing in many discussions about power laws even today.
There was a technical point in the Stumpf and Porter article that did catch my eye but in the end left me disappointed. They highlighted that for random numbers drawn from any long tailed distribution there is a generalization of the central limit theorem. It tells us that sums of many random numbers drawn from such distributions will lie in distributions with a power law tail (you have to define all this carefully though). However this is not as powerful as it sounds. The definition of a long tailed distribution in this context is one with a power law tail. Seems circular to me.
Less seriously I thought the sketch provided in the Stumpf and Porter Science article was a bad example to set in an article about bad representations of data. One axis is a qualitative judgment while the other is an unspecified statistical measure. The ‘data points’ on the plot are mostly unspecified. The one marked Zipf’s law is presumably for the classic data on cities though which of many data sets this on this topic I’m not sure. What I think was intended was a sketch to indicate the subjective judgments of the authors on the nature of power laws which have been suggested in different fields. This would be terms of their two main themes: the statistical support for power laws, and the modelling and interpretation of results. In the end the sketch given didn’t convey anything useful to me.
Still these are minor quibbles. If an article can keep the Imperial Complexity and Networks programme’s weekly meeting talking for an extra half an hour, a discussion led by Gunnar Pruessnar from Imperial, it has got to be doing something good.