Monday, June 1, 2015

Journey through building a Predictive Analytics solution

I have now spent nearly 10 years building predictive models. These have ranged from some detailed segment or population level models using Excel to cutting-edge individual level models for credit risk, response propensity, life-events predictions and so on using statistical packages in R and Python, and back in the days, using SAS. At some point in the last ten years, I also did a foray into text analytics and trying to make sense out of unstructured data.

Building predictive models is tough and takes tme and energy. It is also emotionally consuming and exhausting. Over the years, I have been able to identify four distinct phases through the build-out of any model where I have gone through alternate cycles of depression and joy. Thought I’d share some of that and see whether these cycles resonated with others in the same line of work. And it starts to look somewhat like the picture below.

Phase 1: “Getting the data is really REALLY hard!!!!”

The first phase of depression happens roughly for the first third of any project. You are all excited about this really cool customer problem or business challenge you have in front of you and you have just discovered the perfect predictive modeling solution for the problem. Now all you need to do is to get the data and you will be off to Predictive Modeling Superstardom.

Except that it never happens that way. Data is always messy and needs careful and laborious curation. And if you take shortcuts and are messy about how you handle and manage the data, it almost always comes back to extract its pound of flesh. I am sure there is that phase in the life of every model that you are just frustrated at the amount of time and effort it takes to get data right and even then you are not entirely sure whether you got the data right.

Phase 2: “WOW! I actually built a model and managed to create some predictions!”

The first light at the end of the tunnel happens when you have managed to get the data right, set up the modeling equation correctly (after several attempts at hacking away at the code) and actually ran the model to produce some predictions. And the predictions by-and-large look good and seem to actually make sense when you compare some of the standard metrics such as precision and recall, as well as when create deciles of twenty-tiles of the prediction range and are able to see some decently good predictions! That feeling of excitement and joy is amazingly good.

Phase 3: “Well, actually my predictions aren’t that good!”

The next phase of low happens when you try and examine your predictions at an individual level and discover that - by and large - your predictions are not very accurate at all. Overall, the model seems to work well but at an individual level, the predictions are really off. There are a few cases where the model absolutely nails the prediction but in nearly 60-70% of the cases, the prediction is off in either direction. Not catastrophically off but then off enough to cause some anxiety to the perfectionist in you.

Phase 4: “Phew! Actually the model didn’t do too badly”

Then you actually take the model and apply it to the problem you are trying to solve. So maybe you are looking at customer calls transcripts and trying to predict the likelihood of a follow-up call. Or you are looking at thousands and thousands of loan data and trying to predict the probability of default. Or what a digital experience adoption propensity is likely to look like around thousands of visitors. (Of course, I am assuming all along that the overall problem was structured correctly and the problem you are solving is indeed something worth solving, etc.)

Based on the model, you are hoping to take one or the other action. And then you find that the model actually makes a difference, that you have been able to create happier customers or more profitable loans or an overall better adoption of a digital experience. This feeling - that the predictive modeling solution you built - made an impact in the real world is absolutely spectacular.

I have had these moments of great satisfaction at the end of several such initiatives and there have been certainly situations when things have not exactly followed that script. In my next post, I will talk about the steps that you can take as a predictive modeler, that lead to great outcomes and certain other things that lead to less satisfactory outcomes.


Deepika Patil said...

Analogica data is one of the Best Big Data Analytics Company in pune, services provides advanced Big Data and analytics solutions in pune.

Anonymous said...

Analogica Data is one of the best Offshoring Company for Data Analytics in India, provide Big Data Analytics Solutions and Services in India and US.