A quick piece on how venture capital firms are investing in big data projects. I will aim to provide a more comprehensive detail on all these companies and their business models in another forthcoming post. Now I am not sure whether this perpetuates or contradicts my speculation on big data being a big bubble.
Statistics Exchange
A place where mathematical models, statistical inference, the science of decision making come together to make the future a better place
Saturday, February 25, 2012
Wednesday, January 11, 2012
Wooga, gaming and the power of randomized testing - part 1/3
I read this really impressive piece in the Wired UK magazine that highlighted the need for systematic randomized testing. The piece is about wooga, the social gaming company which uses randomized testing to develop its game ideas and concepts. Wooga is a competitor of Zynga and the conventional video gaming giant, EA.
The randomized testing idea is core to the idea of predictive analytics. In the absence of randomized testing, a couple of issues could arise. The first is that the object being measured could be incorrectly estimated. The second is incorrect attribution of the effect that is being studied to causal factors. The incorrect attribution arises from the fact that non-randomized data is typically biased or “shaped” by specific conditions or factors present when the data was generated. So for example, if one were to take a week of shopping information to study a marketing promotion in the retail industry, and that week that was picked at random happened to be a few days before Thanksgiving, one might be mislead into thinking that a. Turkey sales make up a high percentage of the overall shopping basket in general and b. that skew in the shopping basket distribution was caused by the promotion.
A number of analytically advanced companies have embraced this idea of randomized testing. One of the early pioneers in the financial services area was Capital One (check this really neat link out from the HBR), but the idea since is being used by practically every financial services company. Financial services of course is a fertile ground for randomized testing because of the large number of transactions that create a deep pool of data that lends itself to really powerful statistical modeling. In a couple of subsequent posts, I will talk about some best practices in this area and also some pitfalls to watch out for.
The randomized testing idea is core to the idea of predictive analytics. In the absence of randomized testing, a couple of issues could arise. The first is that the object being measured could be incorrectly estimated. The second is incorrect attribution of the effect that is being studied to causal factors. The incorrect attribution arises from the fact that non-randomized data is typically biased or “shaped” by specific conditions or factors present when the data was generated. So for example, if one were to take a week of shopping information to study a marketing promotion in the retail industry, and that week that was picked at random happened to be a few days before Thanksgiving, one might be mislead into thinking that a. Turkey sales make up a high percentage of the overall shopping basket in general and b. that skew in the shopping basket distribution was caused by the promotion.
A number of analytically advanced companies have embraced this idea of randomized testing. One of the early pioneers in the financial services area was Capital One (check this really neat link out from the HBR), but the idea since is being used by practically every financial services company. Financial services of course is a fertile ground for randomized testing because of the large number of transactions that create a deep pool of data that lends itself to really powerful statistical modeling. In a couple of subsequent posts, I will talk about some best practices in this area and also some pitfalls to watch out for.
Friday, January 6, 2012
Some interesting links
Prof.Raghu Rajan from the University of Chicago has written this interesting piece on the build-up that finally ended in the financial crisis from the Fault Lines blog. He explains lucidly why Keynesian spending might not work as well this time around.
The other interesting write-up is about secular shifts in the unemployment picture in developed countries and some of the underlying causes. From the MIT Technology Review.
Excellent food for thought along with this piece from Joseph Stiglitz that I summarized here.
The other interesting write-up is about secular shifts in the unemployment picture in developed countries and some of the underlying causes. From the MIT Technology Review.
Excellent food for thought along with this piece from Joseph Stiglitz that I summarized here.
Labels:
artificial intelligence,
Computing,
Great recession
Thursday, January 5, 2012
Big analytics or big bubble?
As a practitioner of predictive analytics over the last 8-10 years, it is fascinating to sit back and observe the way the field is gathering attention and importance. Most recently, NPR did a piece on analytics. The NPR pieces were about Gary Loveman who is now the CEO of Caesar Entertainment - it is a podcast and the link is here. I loved this line from the story:
There are three things that can get you fired from Caesars: Stealing, sexual harassment and running an experiment without a control group.
NPR followed up with other couple of pieces - one was an opener on Big Data and the other was the search for analytic talent that can make sense of this Big Data. What made the second article somewhat quirky was that it profiled DJ Patil, a mathematician who searches and tracks and ultimately recruits these data geniuses using - you guessed it, data!
So did the WSJ, just yesterday. The WSJ talked about two analytics consulting firms: Mu Sigma and Opera Solutions, getting substantial amounts of venture funding (think large 8-digit numbers) to expand in this space.
This is the point where I start to get really worried about big data and the b-word. Is this big data, social media, mega-analytics a rapidly building bubble that is bound to pop in some way and leave more than a few people disappointed? I have no doubts in my mind that the idea is powerful and transformational - the idea being that predictive analytics sitting on top of data can make better business decisions, drive better customer insights and resource efficiencies and improve life for us in a holistic kind of way.
But when big media gets on the bandwagon and the subject goes from being talked in technical journals to being talked by the WSJ and NPR, I begin to smell some frothiness.
There are three things that can get you fired from Caesars: Stealing, sexual harassment and running an experiment without a control group.
NPR followed up with other couple of pieces - one was an opener on Big Data and the other was the search for analytic talent that can make sense of this Big Data. What made the second article somewhat quirky was that it profiled DJ Patil, a mathematician who searches and tracks and ultimately recruits these data geniuses using - you guessed it, data!
So did the WSJ, just yesterday. The WSJ talked about two analytics consulting firms: Mu Sigma and Opera Solutions, getting substantial amounts of venture funding (think large 8-digit numbers) to expand in this space.
This is the point where I start to get really worried about big data and the b-word. Is this big data, social media, mega-analytics a rapidly building bubble that is bound to pop in some way and leave more than a few people disappointed? I have no doubts in my mind that the idea is powerful and transformational - the idea being that predictive analytics sitting on top of data can make better business decisions, drive better customer insights and resource efficiencies and improve life for us in a holistic kind of way.
But when big media gets on the bandwagon and the subject goes from being talked in technical journals to being talked by the WSJ and NPR, I begin to smell some frothiness.
Wednesday, December 28, 2011
Car Insurance savings and too-clever marketing
A quick rant post.
I have been reflecting a bit on GEICO, Progressive and others claiming how you can save a lot of money (15%, so many dollars) by switching to that company. A highly deceptive form of advertising and here's why.
First, to start off, the marketing message taken at face-value seems to imply causation - switch to company X and you will save money. In reality, the sequence of events is the opposite. People typically shop for a quote and then when they find the quote saving them money over what they currently have, they switch over. So it is likely that for every person who switches, there are one or more people who don't switch because they don't save any money or they save too little for it to be worth the hassle. So to say that switch and you will save money is somewhat disingenuous. Only some people save money with Company X and they are the ones that switch.
The second part of deception comes in the dollar amount of the switch. The way this information is gathered is typically by surveying customers that have switched. Why is this deceptive? Well, because a number of behavioral economists studies have shown that we human beings tend to rationalize. We tend to give ourselves more credit than necessary or justifiable in general. This manifests itself in a number of ways such as most people thinking they are above-average drivers, people over-estimating investment returns they make and so on. So when a customer has made the (what the customer thinks) is the extremely smart decision to switch, they are likely to also over-estimate the savings that they have realized as they are proud of the switch decision they just took. And so it is very likely that the savings number is inflated to some extent.
So save x% by switching to GEICO is actually a smart ploy to get people to ask for a GEICO quote. Doesn't hurt at all to get one, in an extremely crowded market-place. But promising savings in the language that these companies use doesn't seem very above board.
I have been reflecting a bit on GEICO, Progressive and others claiming how you can save a lot of money (15%, so many dollars) by switching to that company. A highly deceptive form of advertising and here's why.
First, to start off, the marketing message taken at face-value seems to imply causation - switch to company X and you will save money. In reality, the sequence of events is the opposite. People typically shop for a quote and then when they find the quote saving them money over what they currently have, they switch over. So it is likely that for every person who switches, there are one or more people who don't switch because they don't save any money or they save too little for it to be worth the hassle. So to say that switch and you will save money is somewhat disingenuous. Only some people save money with Company X and they are the ones that switch.
The second part of deception comes in the dollar amount of the switch. The way this information is gathered is typically by surveying customers that have switched. Why is this deceptive? Well, because a number of behavioral economists studies have shown that we human beings tend to rationalize. We tend to give ourselves more credit than necessary or justifiable in general. This manifests itself in a number of ways such as most people thinking they are above-average drivers, people over-estimating investment returns they make and so on. So when a customer has made the (what the customer thinks) is the extremely smart decision to switch, they are likely to also over-estimate the savings that they have realized as they are proud of the switch decision they just took. And so it is very likely that the savings number is inflated to some extent.
So save x% by switching to GEICO is actually a smart ploy to get people to ask for a GEICO quote. Doesn't hurt at all to get one, in an extremely crowded market-place. But promising savings in the language that these companies use doesn't seem very above board.
Monday, December 12, 2011
Great Recession - A new theory linked to productivity improvement
I wrote a couple of years back on what has come to be known as the Great Recession of the twenty-first century. I remarked that the recession appears to show no signs of abating and recent events seems to have borne that out. While GDP growth in the US is in positive territory, it barely is. And the problems in Europe and a couple of natural disasters affecting Asia (the earthquake in Japan and the flooding in Thailand) have put brakes on the emerging markets engine that was pulling the world economy along for the last 4 years.
In the meantime, a number of well-argued articles and books have been written about the genesis of the crisis, and they have largely focused on the financial sector, the US mortgage market and the excesses there. The Nobel Prize winning economist, Joseph Stiglitz, approaches this issue from a slightly different angle in a recent write-up in Vanity Fair. Stiglitz argues that the Great Recession has its roots in something more benign than mortgages gone toxic. It lay in the productivity increases in the last two decades and caused a large number of job categories employing very large portions of the labor force to basically become redundant in the economy. What is interesting about this theory is that (Stiglitz argues) this is exactly what happened leading up to the Great Depression. The productivity improvements now are in the areas of manufacturing and services and the productivity improvement then was in agriculture.To quote, In 1900, it took a large portion of the U.S. population to produce enough food for the country as a whole. Then came a revolution in agriculture that would gain pace throughout the century—better seeds, better fertilizer, better farming practices, along with widespread mechanization. Today, 2 percent of Americans produce more food than we can consume.
Extremely interesting article and a forcefully made argument on the cause of the crisis and what could be done to solve it.
In the meantime, a number of well-argued articles and books have been written about the genesis of the crisis, and they have largely focused on the financial sector, the US mortgage market and the excesses there. The Nobel Prize winning economist, Joseph Stiglitz, approaches this issue from a slightly different angle in a recent write-up in Vanity Fair. Stiglitz argues that the Great Recession has its roots in something more benign than mortgages gone toxic. It lay in the productivity increases in the last two decades and caused a large number of job categories employing very large portions of the labor force to basically become redundant in the economy. What is interesting about this theory is that (Stiglitz argues) this is exactly what happened leading up to the Great Depression. The productivity improvements now are in the areas of manufacturing and services and the productivity improvement then was in agriculture.To quote, In 1900, it took a large portion of the U.S. population to produce enough food for the country as a whole. Then came a revolution in agriculture that would gain pace throughout the century—better seeds, better fertilizer, better farming practices, along with widespread mechanization. Today, 2 percent of Americans produce more food than we can consume.
Extremely interesting article and a forcefully made argument on the cause of the crisis and what could be done to solve it.
Saturday, December 10, 2011
Computing based on the human brain - the answer to Big Data?
A slight detour from my usual subjects around predictive analytics. I came across this recent article that is prescient of the direction of modeling and predictive analytics in general. And that is the move away from the current model of computer design, based on the famous von Neumann architecture to something that is much more similar to the thing computing and modeling and decision making are ultimately designed to emulate, viz. the human brain.
First some background. Computer architecture has consistently followed the classic von Neumann architecture. Without getting into too many details, what the architecture boils down to is a separate processing unit (known variedly as CPU, ALU, microprocessor) and a separate memory unit, both connected by a communication channel called a Bus. This architecture has served computing well over the past 50 years, and now has brought the computer within access of every single human being on Earth. The fact that 2-year old toddlers are extremely adept with the Apple iPad is testimony to the success of the von Neumann model. After all, nothing succeeds like success. Even as processor chips have become more advanced and started incorporating their own internal memory module (called cache memory), the von Neumann architecture has been faithfully replicated. But successful doesn't mean ideal or optimal or even efficient. The burn of the laptop on my thigh as I type this post is indication that the current computing model, while successful, is also an extremely power-hungry one. The IBM- Watson machine, famous for playing and beating human opponents in Jeopardy, is also famous for consuming 4000 times the power of its human competitors. The human brain functions with about 20 watts of power while Watson consumes more than 85,000 watts. And all that Watson can do is play Jeopardy. The human brain can do a lot more like writing, recognizing pattern, expressing and feeling emotion, negotiating traffic, even designing computers!
So what might a more efficient model look like? Well, it looks a little more like the human brain. The human brain has both logical problem solving, thinking as well as memory managed through one element of computing infrastructure, so to speak, which is the neuron interconnected through synapses. And that is the model that is being pursued by IBM in collaboration with Cornell, Columbia, the University of Wisconsin and the University of California, Merced. The project is also funded by DARPA, and more details can be found at the link at the start of the page. The big a-ha moment according to the project director and IBM computer scientist, Dharmendra Modha (in the middle of vacation, no less) was to drive the human-brain driven computing project through the fundamental design of the processor chip or the hardware rather than through software. To quote some details from the New York Times article by Steve Lohr,
The prototype chip has 256 neuron-like nodes, surrounded by more than 262,000 synaptic memory modules. That is impressive, until one considers that the human brain is estimated to house up to 100 billion neurons. In the Almaden research lab, a computer running the chip has learned to play the primitive video game Pong, correctly moving an on-screen paddle to hit a bouncing cursor. It can also recognize numbers 1 through 10 written by a person on a digital pad — most of the time.
Why is this relevant to predictive analytics?
What is a mention of this project doing in a predictive analytics blog? It has to do with Big Data. Online, mobile, geo-spatial and RFID technologies are creating streams of data in amounts that would have been impossible to conceptualize even a decade back. As the availability of data increases and the power of conventional computing infrastructure and storage infrastructure gets overwhelmed, we will have to rely on a distributed memory storage and computing set-up that is more similar to the human brain. A space worth watching.
![]() |
| IBM Watson - Super-computer or energy hog? |
So what might a more efficient model look like? Well, it looks a little more like the human brain. The human brain has both logical problem solving, thinking as well as memory managed through one element of computing infrastructure, so to speak, which is the neuron interconnected through synapses. And that is the model that is being pursued by IBM in collaboration with Cornell, Columbia, the University of Wisconsin and the University of California, Merced. The project is also funded by DARPA, and more details can be found at the link at the start of the page. The big a-ha moment according to the project director and IBM computer scientist, Dharmendra Modha (in the middle of vacation, no less) was to drive the human-brain driven computing project through the fundamental design of the processor chip or the hardware rather than through software. To quote some details from the New York Times article by Steve Lohr,
The prototype chip has 256 neuron-like nodes, surrounded by more than 262,000 synaptic memory modules. That is impressive, until one considers that the human brain is estimated to house up to 100 billion neurons. In the Almaden research lab, a computer running the chip has learned to play the primitive video game Pong, correctly moving an on-screen paddle to hit a bouncing cursor. It can also recognize numbers 1 through 10 written by a person on a digital pad — most of the time.
Why is this relevant to predictive analytics?
What is a mention of this project doing in a predictive analytics blog? It has to do with Big Data. Online, mobile, geo-spatial and RFID technologies are creating streams of data in amounts that would have been impossible to conceptualize even a decade back. As the availability of data increases and the power of conventional computing infrastructure and storage infrastructure gets overwhelmed, we will have to rely on a distributed memory storage and computing set-up that is more similar to the human brain. A space worth watching.
Subscribe to:
Posts (Atom)
