Big Data Analytics: The place of Systems Modeling in Analytics

When one talks about predictive analytics, the typical thought process goes in the direction of regression, neural nets, data mining techniques. Techniques that savvy marketers (consumer product companies, banks) have been using for close to two decades now in building insights about consumer behaviour. Systems modeling or Systems Dynamics is not something that immediately springs to mind.

So what is systems modeling all about? Systems modeling is creating a mathematical representation of a real-world phenomenon, trying to cover as wide range a set of inputs as feasible and the most valuable outputs. The systems model tries to explain how the inputs translate to outputs. How the systems model is different from a statistical predictive model is that the purpose of the systems model is not to try and explain variance in the output. The systems model instead tries to establish structural relationships between the input and the output. The model then further stresses the structural relationship by varying the inputs and looking at the impact on the output.

A good example of a subject that can be systems-modeled (my verb!) is the problem of terrorism. The problem has different inputs: unhappy people, territorial disputes, foreign power wanting to create trouble, funding, media coverage, etc. The immediate output is various actions of terrorism such as assassinations of leaders, suicide bombings, etc. It might be feasible to build a model that creates a structure on how these various inputs combine and interact with one another and cause the outputs. (If one goes back over the past 150 years, there should be plenty of data points.) Another way of looking at the output is a more holistic view that measures the damage done in terms of lives lost, economic damage incurred, etc.

What would be the purposes of this model? In my opinion, the value of such a model is less around where the next terrorist strike is going to be, or how big the next strike is going to be. (This is incidentally what a classic statistical model is going to try to do.) But rather, the model should try and explain what are the confluence of factors that produce a large output event (lives lost, economic damage) and how can some of the factors be controlled, ONCE an insurgency is already underway. The hypothetical model I am talking about does not try to predict, but rather to strengthen our understanding of the system dynamics. The model would have a PoV on what inputs can be controlled and to what extent are they controllable.

The model would then be used to understand how a large impact event can be prevented or its impact minimized. So if the federal government had a $100 billion to spend, how much should they spend on homeland security vs. promoting a positive image of the United States through foreign media? The model might tell that it is pointless to spend more than, say, $500 million on putting in a sophisticated software to block large untraced wire transfers as there are other ways in which the funding can be made available to the perpetrators of the terrorism act. So controlling the funding for an insurgency through sophisticated money laundering and layering detection algorithms may be pointless if the actual money gets exchanged through a non-electronic channel.

So an agency interested in curbing terrorism, might be better advised in, say, over-investing in trauma care health facilities and emergency services in vulnerable areas. This is so that when a strike does take place, medical help for the people who are affected is close at hand and casualties are minimized.

Why am I writing all this? Analytical problem solving is not just about fancy statistical algorithms or cool math, it is also about thinking hard about problems and creating their mathematical representations - and then being crystal clear about what those mathematical representations can and cannot do. This is where the systems modeling approach can be a very effective portion of the arsenal of a business modeler.

I'll close out with a couple of links, which prompted this wave of thinking on this subject. One is a paper in the Nature journal where the authors have presented a statistical model of insurgency events. The link is here. It's a gated article.

The following link has a very good critique on the article.

Big Data Analytics

Saturday, December 19, 2009

The place of Systems Modeling in Analytics

No comments:

Sitemeter