Sunday, December 12, 2010

Thinking statistically – and why that’s so difficult

I came across this piece from a few months back by the Wired magazine writer, Clive Thompson on “Why we should learn the language of data”. The article is one amongst a stream of recent articles in the popular media of how data-driven applications are changing our world. The New York Times has had quite a few pieces on this topic recently.

Clive Thompson calls out how the language of data and statistics is going to be transformational for the world, going forward and how it needs to be core part of general education. Thompson also calls out why thinking about data trends or statistics is hard. It is hard because it is not something that the intuitive wiring in the human brain readily recognizes or appreciates. The human psyche with its fight-or-flight instincts reacts to big, dramatic events well and to subtle trends badly. We are not fundamentally good at a number of things that good decision making calls for, such as being open to both supporting and refuting evidence, not confusing correlation and causality, factoring uncertainty, estimating rare events.

Most of the applications where a data-driven insight has changed the world in any meaningful way have been driven by private enterprise. These changes have also been somewhat incremental in nature. Of course, it has allowed companies to recommend movies to interested subscribers, position goods in stores more effectively, distribute at lower cost, price tickets so as to ensure maximum returns and so on. In other words, these changes may have been game changing for specific industries but not necessarily for the entire human race at large.

Numbers can have greater power than just impacting a few industries at a time, one would think. Just given the sheer amount of data that is being produced in the world today and the rate at which both computing power and bandwidth continues to grow, we ought to have seen a much more wide ranging impact from data driven analysis. We should have been firmly down the road to making progress on combating global warming, diseases like heart disease, diabetes and cancer. Government agencies which are a really big part of the modern economy has not been as successful at driving this form of data driven innovation. Why is that?

This probably has got to do with a fundamental lack of understanding of numbers and statistics, amongst the population at large. The places in the world where a lot of the data gathering and processing is happening, i.e. the Western world, are also the places where an education in science and math is somewhat undervalued in relation to studies like liberal arts, media, legal studies, etc. That is where the emerging economies of the world have an edge. Study of math, science and engineering has always been appropriately valued in countries like India, China and other emerging Asian giants. Now as these countries also begin to generate, process and store data, the math and science educated talent will be chafing at the bit to get into the data and harness its potential. Data has been rightly called as another factor of production like labour, capital and land. It is an irony in the world today that those who have data within easy reach are less inclined to use it.

No comments: