Coming back to my topic on the business rationale for big data or “Why does Big data make sense?" In a previous post on the Business Applications of Big Data, I mentioned six specific applications and benefits of big data in the modern large organization. While doing so, I have been specific about benefits from big data technology and tried to draw a distinction from what are generic benefits from data-driven analytics.
One of my pet peeves when I read about the benefits of big data are that they often relate to the benefits of data and analytics more generically, or (if the author is trying to be at least somewhat intellectually honest), unstructured data and text analytics. Take for example this excerpt from McKinsey’s report on big data. The reason why I am picking on McKinsey here is because they consider themselves (and are considered, in some circles) to be the smartest guys in the room. And I’d have expected them to be a little more discerning when it comes to differentiating between data/ analytics driven business insights and the somewhat narrow technical area, which is big data.
McKinsey says in its report
There are five broad ways in which using big data can create value. First, big data can unlock significant value by making information transparent and usable at much higher frequency. Second, as organizations create and store more transactional data in digital form, they can collect more accurate and detailed performance information … and therefore expose variability and boost performance. Leading companies are using data collection and analysis to conduct controlled experiments to make better management decisions …Third, big data allows ever-narrower segmentation of customers and therefore much more precisely tailored products or services … Fourth, sophisticated analytics can substantially improve decision-making. Finally, big data can be used to improve the development of the next generation of products and services. For instance, manufacturers are using data obtained from sensors embedded in products to create innovative after-sales service offerings such as proactive maintenance (preventive measures that take place before a failure occurs or is even noticed).
Now, ALL of the data points talked about are either too generic (‘can substantially improve decision making’, ‘create the next generation of products and services’) or are things that apply more generically to good data/ analytics based business models (‘use of data at higher frequency, ‘more transactional data’, ‘narrower segmentation and precisely tailored products and services’). And so for someone who is trying to understand specifically whether to stay with traditionally RDBMS or embrace big data, this kind of commentary is useless. What I am going to try and do is to call out some of the benefits of big data that are uniquely driven by the specific big data technologies.
The business reasons why big data is a useful idea for organizations to embrace and implement comes down to a few specific things. All of these have to do with the fundamental technology innovation that drove big data’s growth in the first place. As David Williams, CEO of Merkle explains, if big data was merely an explosion in the amount of data suddenly available, it would be called ‘lots of data’. There's clearly more to this phenomenon, particularly since lots of data has always existed. So what are these technology innovations that typify big data?
These are parallel storage and computation, on commodity hardware using open source software. Often, the hardware is centrally located and managed and connected to the user community through high-speed internet connections. And therefore, the data is not local, but rather resides in a ‘cloud’. These innovations in turn translate to a number of benefits
- Lower cost of storage (as compared to traditional technologies like a database storage, or tape storage) at lower cost
- Lower latency in getting access to really old data
- Faster computing in situations where batch computation suffices (the operative words here are ‘batch’ and ‘suffices’.) Random update and retrieval of individual records, and computation in real-time are not strengths traditionally associated with big data through there are some hybrid providers that now are able to straddle real time processing and batch processing somewhat.
- Flexible database schema, which makes the data infrastructure scalable in the columnar dimension (now, I am sure I made up that phrase). This has not been a direct technology innovation from the original big data architecture as envisaged by Yahoo! and Google, but rather can be considered part of the overall big data ecosystem
It is the first of these technological innovations that lead to the first big business rationale from big data – which is better access to and eventually better use of historical data. The availability of a large amount of historical data translates to better analytics and better predictive models, all else being equal. There is actual empirical data based on which decisions can be taken as against taking educational guesses.
Before big data, organizations did one of several things to manage the large amount of historical data they invariably built up over time. Some of them just threw the data away, after establishing a certain retention period for the data – this would typically be 24-48 months. Others retained portions of the data and threw the rest of it away. So if a certain business operation generated 100 elements of performance data, organizations would retain the ‘important’ ones (the ones typically used for financial reporting and planning) and would throw the rest away. The third strategy was to keep the data but do so in an off-line medium like storage tapes. The problem with storage taps is that they tend to degrade physically and the data is often lost. If not, the data is simply too difficult to retrieve and bring it back on line and so analysts seldom take the trouble of chasing after this data.
With the advent of big data, it is now possible to put historical data away in low-cost, commodity storage. Now this storage is a. low-cost, b. can be retrieved relatively quickly (not necessarily on demand, like one would be able to get from an operational data store) but not with a latency of several days), and c. does not degrade like tapes do. This is one big advantage of big data.
So, if your organization generates a lot of performance data, and the default strategy for managing this data load has been simply to throw the data away, then big data helps in creating an easily accessible storage mechanism for this data. The easy accessibility means that analysts and decision-makers in the organization can use the historical data to delve deep into the data and come up with patterns. This in turn is an enabler of smarter decisions. Big Data therefore enables smarter decisions indirectly - it is not a direct contributor. The analytics that result out of long and reliable historical data drive the smarter decision making.