Coming back to my topic on the business rationale for big data or “Why does Big data make sense?" In a previous post on the Business Applications of Big Data, I mentioned six specific applications and benefits of big data in the modern large organization. While doing so, I have been specific about benefits from big data technology and tried to draw a distinction from what are generic benefits from data-driven analytics.
One of my pet peeves when I read about the benefits of big
data are that they often relate to the benefits of data and analytics more
generically, or (if the author is trying to be at least somewhat intellectually
honest), unstructured data and text analytics. Take for example this excerpt
from McKinsey’s report on big data.
The reason why I am picking on McKinsey here is because they consider
themselves (and are considered, in some circles) to be the smartest guys in the
room. And I’d have expected them to be a little more discerning when it comes
to differentiating between data/ analytics driven business insights and the
somewhat narrow technical area, which is big data.
McKinsey says in its report
There are five broad
ways in which using big data can create value. First, big data can unlock
significant value by making information transparent and usable at much higher
frequency. Second, as organizations create and store more transactional data in
digital form, they can collect more accurate and detailed performance
information … and therefore expose variability and boost performance. Leading
companies are using data collection and analysis to conduct controlled
experiments to make better management decisions …Third, big data allows
ever-narrower segmentation of customers and therefore much more precisely
tailored products or services … Fourth, sophisticated analytics can
substantially improve decision-making. Finally, big data can be used to improve
the development of the next generation of products and services. For instance,
manufacturers are using data obtained from sensors embedded in products to
create innovative after-sales service offerings such as proactive maintenance
(preventive measures that take place before a failure occurs or is even
noticed).
Now, ALL of the data points talked about are either too
generic (‘can substantially improve decision making’, ‘create the next
generation of products and services’) or are things that apply more generically
to good data/ analytics based business models (‘use of data at higher
frequency, ‘more transactional data’, ‘narrower segmentation and precisely
tailored products and services’). And so for someone who is trying to understand
specifically whether to stay with traditionally RDBMS or embrace big data, this
kind of commentary is useless. What I am going to try and do is to call out
some of the benefits of big data that are uniquely driven by the specific big
data technologies.
The business reasons why big data is a useful idea for
organizations to embrace and implement comes down to a few specific things. All
of these have to do with the fundamental technology innovation that drove big
data’s growth in the first place. As David Williams, CEO of Merkle explains,
if big data was merely an explosion in the amount of data suddenly available, it would
be called ‘lots of data’. There's clearly more to this phenomenon, particularly since lots of data has always existed. So what are these technology innovations that typify big data?
These are parallel storage and computation, on commodity
hardware using open source software. Often, the hardware is centrally located
and managed and connected to the user community through high-speed internet
connections. And therefore, the data is not local, but rather resides in a ‘cloud’.
These innovations in turn translate to a number of benefits
-
Lower cost of storage (as compared to traditional
technologies like a database storage, or tape storage) at lower cost
-
Lower latency in getting access to really old data
-
Faster computing in situations where batch computation suffices (the operative words here are
‘batch’ and ‘suffices’.) Random update and retrieval of individual records, and
computation in real-time are not strengths traditionally associated with big
data through there are some hybrid providers that now are able to straddle real
time processing and batch processing somewhat.
-
Flexible database schema, which makes the data
infrastructure scalable in the columnar dimension (now, I am sure I made up
that phrase). This has not been a direct technology innovation from the
original big data architecture as envisaged by Yahoo! and Google, but rather can
be considered part of the overall big data ecosystem
It is the first of these technological innovations that lead
to the first big business rationale from big data – which is better access to
and eventually better use of historical data. The availability of a large
amount of historical data translates to better analytics and better predictive
models, all else being equal. There is actual empirical data based on which
decisions can be taken as against taking educational guesses.
Before big data, organizations did one of several things to
manage the large amount of historical data they invariably built up over time. Some
of them just threw the data away, after establishing a certain retention period
for the data – this would typically be 24-48 months. Others retained portions
of the data and threw the rest of it away. So if a certain business operation
generated 100 elements of performance data, organizations would retain the ‘important’
ones (the ones typically used for financial reporting and planning) and would
throw the rest away. The third strategy was to keep the data but do so in an
off-line medium like storage tapes. The problem with storage taps is that they tend
to degrade physically and the data is often lost. If not, the data is simply
too difficult to retrieve and bring it back on line and so analysts seldom take
the trouble of chasing after this data.
With the advent of big data, it is now possible to put historical
data away in low-cost, commodity storage. Now this storage is a. low-cost, b.
can be retrieved relatively quickly (not necessarily on demand, like one would
be able to get from an operational data store) but not with a latency of
several days), and c. does not degrade like tapes do. This is one big advantage
of big data.
So, if your
organization generates a lot of performance data, and the default strategy for
managing this data load has been simply to throw the data away, then big data
helps in creating an easily accessible storage mechanism for this data. The
easy accessibility means that analysts and decision-makers in the organization can
use the historical data to delve deep into the data and come up with patterns. This in turn is an enabler of smarter decisions. Big Data therefore enables smarter decisions indirectly - it is not a direct contributor. The analytics that result out of long and reliable historical data drive the smarter decision making.