In this post, I am going to start to elaborate on why big data makes sense. Now, this doesn’t clearly sound like ground-breaking insight. You can google “Big data” and you can come up with literally hundreds of articles that will invariably say how the amount of data generated in the world exceeds the storage capacity available. That customers are generating petabytes of data through their interactions, feedback, etc. That cost of computing and storage is a fraction of what it used to be even ten years back. That Google, Amazon, Facebook
invested in big data infrastructure by setting up commodity servers.
But what I have personally found missing in all of this megatrend information, is that there is rarely a clear articulation of why a big company should embrace big data. There are a number of good reports and industry studies on the subject, and the McKinsey report on big data is an exceptional read (the graphic above is derived from the McKinsey Global Institute’s study on big data) – but all of them spend an extensive amount of time making the case for big data technologies, and not enough time, in my opinion, on the business rationale that makes it inevitable for an organization to invest in big data.
So in my understanding of the space, what are some of these elements of business rationale that support investment into big data? (I have to qualify my statement, that these would apply to a typical large organization that already has a well-established RDBMS or traditional-data-based infrastructure. For a start-up, using big data technologies for one’s data infrastructure is a no-brainer decision. The question of rationale comes up when an organization has considerable already invested in traditional data and where the adjustment to introduce big data technologies into the overall ecosystem is not going to be trivial.)
There are 6 specific areas where I have been able to find a sound business rationale for investing in big data. These are:
1. Reducing storage costs for historical data and allowing data to be retained for extended periods and making it readily accessible
2. Where significant batch processing is needed to create a single summarized record (for different downstream business decisions) Creating a single summarized record based on batch processing
3. When different types of data need to be combined, to create business insight – or rather to get slightly more specific, to create a single summarized customer-level record
4. Where there are significant parallel processing needs
5. Where there is a need to have capital expenditure on hardware scale with requirements
6. Where there are significant data capture and storage needs
In subsequent posts, I will make these different elements of business rationale tangible through specific business situations.