The next part in an ongoing series about the value of big data. The first post tried to draw a difference between the value delivered by better analytics generically speaking and by big data, specifically. We talked about 6 specific applications of big data:
1. Reducing storage costs for historical data
2. Where significant batch processing is needed to create a
single summarized record
3. When different types of data need to be combined, to
create business insight
4. Where there are significant parallel processing needs
5. Where there is a need to have capital expenditure on
hardware scale with requirements
6. Where there are significantly large data capture and
storage needs, such as data being captured through automated sensors and
transducers
In the previous post, I talked about how one of the
applications of Big Data is better management of historical data.
The low cost storage and the ready accessibility means that historical data can
be stored in a Hadoop cluster – and eschewing some of the legacy storage
solutions like tape drives, that are less reliable as well as take longer to
retrieve.
In this post, I am going to talk about when there is a
business need to create a summarized data table on lots and lots of records of
underlying data which are event level. There are a large number of use-cases where
the business need to create a summarized record exists. The most basic is in reporting
and aiding management decision making. Reporting requires information to be
summarized across multiple business segments and at a weekly or a monthly
frequency. A good example is Personal Financial Management or PFM systems,
which classify credit card transactions and provide summaries across merchant
categories and for a month. The individual credit card transactions would be
stored as individual records across multiple different storage units and a
Map-Reduce algorithm would run as a batch program that summarizes this
information.
Another application is as segmentation or targeting
variables in any kind of marketing and personalization campaign. A third
application that is particularly relevant in the Digital marketing and
e-commerce world is for use in recommender systems. Recommender systems make
product suggestions based on customer profile and their past behaviors – given that
these recommendations need to be made in mere milliseconds in the middle of a
marketing campaign, running through all available records to extract the
relevant piece of profiling information is not technically feasible. What is
better is to have a batch job running overnight that summarizes information and
creates a single record (with several hundreds of fields) for each customer. Read
this link from the IEEE Spectrum magazine "Deconstructing recommender systems" for a particularly good exposition on recommender systems.
Architecture of a data summarization application
So what would the data architecture of such a solution look
like? Clearly, the Big Data portion of the overall stack, the transaction level
data, would reside in a Hadoop cluster. This would give the unbeatable
combination of cheap storage and extremely fast processing time (by virtue of
the MapReduce capabilities). The relative shortcoming of this system, the
inability to provide random access capabilities to an outside application that
reads the database, is irrelevant here. That is because the sole objective of
the Hadoop cluster would be to ingest transaction level data and convert into
summary attributes through a batch program.
The summary table would need to be built on a traditional
RDBMS platform, though there are BigData variants as well like MongoDB that
could do the job. The need here is for fast random access for marketing
applications, recommender systems and other users of the summary information.
So to summarize, Big Data lends itself extremely nicely to
creating data tables that aggregate transaction-level data into entity-level
(the most common entity being a customer) information. The work itself would be
done through a batch job that can take advantage of MapReduce.
In my next post, I will start to touch upon how BigData is
ideally suited to process different types of data, both structured and
unstructured.
7 comments:
Everyone loves what you guys are up too. Such clever work and coverage!
Keep up the awesome works guys I've added you guys to blogroll.
Also visit my web site : abcdistributing catalog
I think the admin of this web page is actually working hard in favor of his site, for the reason that
here every stuff is quality based material.
My web site - st augustine MMA
My page - gracie st sugustine
Nice post. I was checking constantly this blog and I'm impressed! Extremely useful information specially the last part :) I care for such information much. I was looking for this particular information for a very long time. Thank you and best of luck.
Also visit my page :: Corporate Movers
Thiѕ іnfo iѕ pricеlesѕ. Ηow can I find out moгe?
Hеrе іѕ my pаge payday quick loan
That is verу fascinating, Yοu're an overly professional blogger. I have joined your feed and stay up for in search of extra of your fantastic post. Additionally, I'ѵе shared your website in my ѕocial nеtworks
Herе iѕ my ωeb pagе premature ejaculation pills
Getting the services of an matt cutts campaign is individual
and depends completely on your matt cutts services and the cost of
having them promote your site. Check Rankings If you are looking
for.
Feel free to surf to my website; high search engine ranking optimization
Nice post. I learn ѕomething totally
new and challenging on websitеs I stumbleupοn еveryday.
It will alwayѕ be useful to read thrоugh
content from other authors and use а littlе something from other websіtes.
Feеl freе to surf to my web blog crearfacebook.webs.com
Post a Comment