Sunday, August 21, 2011

Analyzing Tesco - the analytics behind a top-notch loyalty program

My specific interest within predictive analytics have been as much about the technology and the data mining techniques that can be applied to the data, as much as it has been about the business value that can be extracted from the data. With this second interest in mind, I am going to embark on a series of a different kind of blog posts.

Instead of mostly talking about theory, I am going to share examples of how companies are using the power of analytics to know their customers better, anticipate their needs and ultimately become more profitable. One of the beacons in this space on whom many a volume has been written is the Tesco, the UK (and now increasingly international) retailing giant. What I am going to cover in this piece is the Tesco loyalty card, how it works and the different ways in which a retailer can take advantage of the information base created by the card to generate economic value.

First some background. Tesco hired the marketing firm of dunnhumby to develop a new loyalty program, to enable it to grow in the UK market. By 1995, the Clubcard launched with nearly instant success as Tesco enjoyed a large increase in customer loyalty and retention. Within the first five years sales had risen over 50%.

The structure of the card, and how the data is collectedThe data gathering for the loyalty program starts with a typical application which might ask for some basic demographic information such as address, age, gender, the number of members in a household and their ages, dietary habits.. Against this basic information, purchase history is appended. This includes the goods shopped for, and also information such as visit history, both to stores and online.

Next, a number of summary attributes are also computed. These include share of wallet information, information on frequency and duration of visits. Also information on customer preferences and tastes, as determined by some clever cluster analysis based on purchase history of specific fast-moving products. See this link for a review of a book describing the Tesco loyalty program called “Scoring Points”.

Tesco realized that better information leads to better results and created Crucible—a massive database of not only applicant information and purchase history, but also information purchased and collected elsewhere about participating consumers. Credit reports, loan applications, magazine subscription lists, Office for National Statistics, and the Land Registry are all sources of additional information that is stored in Crucible.

To summarize, Tesco maintains information about:
1. Customer demographics
2. Detailed shopping history
3. Purchase tastes, frequency, habits and behaviours
4. Other individual level indicators obtained from public sources

Creating this database is an undertaking in itself. Many organizations realize the value of such detailed data and are able to spend the resources to get it; however, they do such a bad job of integrating the data and making it available to analysts that only a fraction of the power in the data is realized.

Technology challengesWhat were some of the challenges faced from a technology standpoint? To start with, one of scale. Specifically, how to scale up from an analytical lab scale to servicing 10 million customers. In the words of Clive Humby of dunnhumby, “we're very pragmatic, so to begin with, we worked on a sample of data. We'll find the patterns in a sample, and then look for that pattern amongst everybody, rather than just trying to find it in this huge data warehouse.”

Sir Humby has revealed some interesting insights in this interview.

Tesco uses a hybrid mix of technologies: Oracle as the main data warehouse engine, SAS for the actual modeling, and White Cross and Sand Technology as the analytic engine for applying the learnings to larger volumes of data. Additionally, the technology group used a number of home-developed technologies and algorithms. This is a nice blog written by Nick Lansley about the technology choices made by Tesco - with some filtering, of course.

And finally, the business value or the economic benefits
1. LoyaltyThe first clear benefit is customer loyalty and the increased spend that comes from a customer moving most of their purchases on to Tesco. The loyalty program incentivizes customers to steer a greater share of their monthly grocery spend onto Tesco, which in turn explains the increase in market share for Tesco from about 15-20% to about 30% of the total UK market in the period from 1995 to about 2005.This is a clear objective of any loyalty program and Tesco delivers on the business objective brilliantly. Tesco does this by offering vouchers on associated products - so if a family is buying infant formula, it is quite a straightforward decision to offer them discounts on diapers and get the customer to move that part of the purchase also to Tesco.

2. Cross-sellsThe most immediate extension from increasing spend within one product category is cross-selling across product families. So an example of this would be (from the previous example) marketing a college-fund financial product to a family that has newly got into infant food and diapers purchases. The way Tesco would do this, I would imagine, is to have a family or customer level flag for “Has small children” or something of the sort. An alternative would be to see Disney Cruises to a family with small children. In this case, Tesco would not only collect a channel fee from Disney for selling their cruises through its site but also a premium for being demonstrably targeted in their marketing.

3. Inventory, distribution and store network planningThe first two applications are about knowing consumer needs better and targeting available products and services more effectively. The next benefit from this data is from materials movement. By getting a precise handle on demand and particularly, anticipating demand spikes in response to promotions, the company can do an effective job with its demand planning and managing the distribution pipeline efficiently from the (edits begin) manufacturing points to the distribution centers.

Also, based on the demographic (customer self-reported) and public information that is appended to the customer level database, a basis is created for inventory planning. So lets say Tesco wants to open a store in a region where there are a large number of families with young children residing, it becomes possible to anticipate the demand for baby products if a Tesco branch were to be opened in that region and stock up accordingly.

4. Optimal targeting and use of manufacturer promotionsAnother area of value for Tesco is optimal use of manufacturer’s promotions, such as either direct purchase discounts or one-for-many type schemes. At the outset, it might appear that retailers like Tesco would love manufacturer’s coupons and rebates. Woo wouldn’t like it if there was greater foot-traffic and purchase activity that came from a scheme, and all the cost was borne by the manufacturer? In reality though, things are never as simple as that. Retailer don't really want to run too many promotions, because managing promotions (displays, new labeling, frequent restocking, possible overstocking and the cost of damaged or expired inventory) is very labor intensive and also adds to the supply-chain costs.

So one of the areas that Tesco specializes in is promotion optimization. Which means, given the 100s of promotions available at any given point in time, which 25-30 to pick and suugest a price to negotiate with the manufacturer. The optimization is based on:
- The cost of running the promotion including inventory costs and labor costs
- Local geography based factors - what kind of customers shop at a local store and what are their unique preferences
- Ensuring there’s something for everyone - ensuring every customer has a fair chance of getting a few promotional offers, given their typical purchase behaviour

5. Consumer insight generation and marketing those insightsA final area of economic value for Tesco is gleaning higher-level customer insights that other entities would be interested in. For example, Procter and Gamble would be EXTREMELY interested in knowing how households of different sizes and at different points of the economic spectrum buy and use laundry detergent. And how that use changes with seasons, over time and so on. Also, what is the propensity for such customers to buy and use related products such as, say, fabric softeners.

Given Tesco’s vantage point and their detailed view of what a customer’s purchases really looks like, it becomes really easy for Tesco to glean such insights from the data and see the information to a bunch of interested parties. This is another source of economic value for Tesco.

This post is getting really long - so let me stop here and summarize. We just discussed the types of data that is gathered by a top-notch loyalty program like Tesco’s and also what are all the sources of economic value from this data that Tesco gathers. In my next posts, I will talk about the potential value from such a program for Tesco and its comparable costs. What have been some of the unique and honestly hard-to-replicate factors that have helped Tesco succeed in this space. Also, what have been some of the competitive responses and how is this area evolving in the emerging SOcial, LOcal, MObile (or SoLoMo) world.

A set of interesting links about Tesco's loyalty program.

Wednesday, August 3, 2011

Tips for data mining - part 4 out of 4

My labor of love which started nearly seven months back is finally drawing to a close. In previous pieces, I have talked about some of the lessons I have learned in the field of data mining. The first two pieces of advice which were covered in this post were
1. Define the problem and the design of the solution
2. Establish how the tool you are building is going to be used

The next pieces were covered in this post and they were
3. Frame the approach before jumping to the actual technical solution
4. Understand the data

In the third post in this epic story (and it has really started feeling like an epic, even though it has just been three medium length posts so far), I covered:
5. Beware the "hammer looking for a nail
6. Validate your solution

Now based on everything I have talked about so far, you actually go and get some data and build a predictive model. The model seems to be working exceptionally well and showing high goodness-of-fit with the data. And there, you have reached the seventh lesson about data mining which is

7. Beware the "smoking gun"
Or, when something is too good to be true, it probably is not true. When the model is working so well that it seems to be answering every question that is being asked, there is something insidious going on - the model is not really predicting anything but just transferring the input straight through to the output. It could be that a field that is another representation of the target variable is used as a predictor. Lets take an example here. Let us say we are trying to predict the likelihood that a person is going to close their cellphone plan, or in business parlance, the likelihood that the customer is going to attrite. Also, let's say one of the predictors used is whether someone called up the service cancellation queue through customer service. By using the "called service cancellation queue" as a predictor, we are in effect using the outcome of the call (service cancellation) as both a predictor as well as the target variable. Of course the model is going to slope extremely nicely and put everyone who met the service cancellation queue condition as the ones most likely to attrite. This is an example of a spurious model, it is not even a bad or an overfit model. Not understanding the different predictors available (or rather not paying attention to the way the data is being collected) and providing justification as to why they are being selected as a predictor in the predictive model is the most common reason why spurious models get built. So when you see something too good to be true, watch out.

8. Establish the value upside and generate buy-in
Now lets say you manage to avoid the spurious model trap and actually build a good model. A model that is not overspecified, independently validated and using a set of predictors that are tested for quality and have been well understood by the modeler (you). Now the model should be translated to business value in order to get the support of the different business stakeholders who are going to use the model or will need to support the deployment of the model. A good understanding of the economics of the underlying business model is required to value the greater predictive capability afforded by the model. It is usually not too difficult to come up with this value estimate, but this might seem like an extra step at the end of a long and arduous model build. But this is a critically important step to get right. Hard-nosed business customers are not likely to be impressed by the technical strengths of the model - they will want to know how this adds business value and either increases revenue, reduces costs or decreases the exposure to unexpected losses or risk.

So, there. A summary of all that I have learned in the last 4-5 years of being very close to predictive analytics and data mining. It was enjoyable writing this down (even if it took seven months) and I hope the aspiring data scientist gets at least a fraction of the enjoyment I have had in writing this piece.

Monday, August 1, 2011

Good documentation about data - a must for credible analytics

One of the cardinal principles of predictive analytics is that you are only as good as the data that you use to build your analysis. However, another important principle is that the data handling processes also have to be well managed and generally free of error.

Recently a set of incidents came to.light which talked about the damage that can be caused by indifferent data handling process. This was in the field of cancer research which points to the human cost from some of these mistakes. One of the popular recent techniques in cancer research analysis of gene level data is micro array analysis. A primer on what this analysis involves can be found in this link here

Duke University cancer researchers promised some revolutionary new treatments of cancer. But when patients actually enrolled in trials, the results were disappointing. Then the truth came out. The analysis was done wrong and the reports resulted from some elementary errors in data handling by the researchers. Two researchers, Baggerly and Coombes, who had to literally reverse engineer the analytical approach used concluded that some simple errors resulted in the wrong conclusions.

A few takeaways for a data scientist:
1. Data handling scripts and processes need to be checked and double-checked. Dual validation is a well-known technique; it is also known as a parallel run. The idea here is to have two independent sets of analysts or systems to process the same input data and make sure the outputs are the same.

2. Data handling needs to be well-documented. The approach used to arrive at a set of significant findings can never be shrouded in mystery, either intentionally or because of sloppy documentation. At best, it gives the appearance of slipshod and careless work. At worst, it appears like a deliberate deception. Neither of these impressions are good ones to make.

A summary presentation from Baggerly and Coombes about this issue can be found here.