Big Data Analytics: Monte Carlo simulations gone bad

In my series on stress testing models, I concluded with Monte Carlo simulations as a way of understanding the set of outcomes a model can produce and being able to handle a wide set of inputs without breaking down. However, Monte Carlo simulations can be done in ways that at best, are totally useless and at worst, can produce highly misleading outcomes. I want to discuss some of these breakdown modes in this post.

So, (drumroll), top Monte Carlo simulation fallacies I have come across.
1. Assuming all of the model drivers are normally distributed
Usually the biggest fallacy of them all. I have seen multiple situations where people have merrily assumed that all drivers are normally distributed and hence can be modeled as such. In most events in nature, heights and weights of human beings, sizes of stars, it is fair to expect and find distributions that are normal or even close to normal. However, not so with business data. Because of the influence of human beings, business data tends to get pretty severely attenuated at places and stretched out at some other places. Now, there are a number of other important distributions to consider (which will probably form part of another post sometime), but assuming all distributions are normal is pure bunkum. But this is usually a rookie mistake! Move on to ...

2. Ignoring the probabilities of extreme tail events
Another quirk of business events is the size and frequency of tail events. Tail events astound us frequently with both their size and their frequency. Just when you thought Q4 08's GDP drop of close to 6% is a once-a-100-years event, it then goes and repeats itself in the next quarter. Ergo, with 10% falls in market cap in a day. Guess what you see the next trading day! Short advise is, be very afraid of things that happen in the tails. Because these events occur so infrequently, distributions are usually misleading in this space. So if you are expecting your model to tell you when things go bump at night, you will be in for a rude shock when they actually go bump. But why go to the tails when there are bigger things that lurk in the main body of the distribution, such as...

3. Assuming that model inputs are independent
Again, this is another example of a lazy assumption. People make these assumptions because they are obsessed with the tool at hand and its coolness-coefficient and cannot be bothered to use their heads and use the tool to solve the problem at hand. I am going to have a pretty big piece on lazy assumptions soon. (One of my favourite soap-box items!) When people run Monte Carlo simulations, the assumptions and inputs to the model are usually correlated to each other to different degrees. This means that the distributions of outcomes that you get at the end are going to crunched together (probability-density wise) at some places and are going to be sparse at some other places. But assuming a perfectly even distributions on either side of the mean is really not the goal here. The goal is to get as close an approximation of real-life distributions as possible. But then if only things were that simple! Now, you could be really smart and get all of the above just right and build a really cool tool. You could then get into the fourth fallacy of thinking ...

4. That it is about the distribution or the tool, it is NOT! It is about what you do with the results of the analysis
The Monte Carlo simulation tool is indeed just that, a tool. The distributions produced at the end of running the tool are not an end in themselves, they are an aid to decision making. In my experience, a well-thought out decision making framework needs to be created to make use of the distribution outputs. The decision-making framework could go something as follows. Let's take a framework to evaluate investment decisions, that uses NPV. One framework could be: I will make the investment only if a.) the mean NPV I can make is positive, and b.) less than 20% of the outcomes are negative NPV, and c.) less than 5% of the outcomes are negative NPV of less than $50 million. There's really no great science in coming up with these frameworks, but it has to be something that the decision maker is comfortable with and it should address uncertainty in outcomes.

So, have you come across some of these fallacies in your work? How have you seen the Monte Carlo tool used and misused in your work? And what decision making frameworks (if any) were allied with this tool to drive good decisions?

Big Data Analytics

Saturday, June 20, 2009

Monte Carlo simulations gone bad

No comments:

Sitemeter