After three hectic
days at the Strata conference trying to appreciate the poetry, I am
on my way back on the Acela from New York to DC. There was tons to
learn from the conference and words can only do so much justice but
there is a set of learnings I want to share from my perspective.
Caveat: These are all colored by my knowledge, my personal context,
my organizational context but a lot of learnings are things that I am
sure are going to resonate with a lot of people. Also another caveat
that there is no neat structuring of what I am going to share, so
treat it as such. So here goes:
1. Map Reduce as we
know it is already behind us
MapReduce as a
specific set of technologies written in Java (not as an overall
philosophy, as indeed, MapReduce has become a philosophy very similar
to Agile) is already behind us. Now we had MapReduce 2.0 come out
late last year and it has been an improvement definitely on MapReduce
1.0. But when it comes to large-scale ingestion of data and making it
usable, the mainstream has shifted to Apache Spark. What is
surprising is that Spark as a technology is fairly new and not very
stable. But the pace of technology evaluation is such that people are
finding use for Spark in a number of really relevant and creative
ways. And in 3 years, technologies like Spark will replace what
MapReduce almost entirely. (Even though some people are going to
argue there is a place for both)
2. Using BigData
tools vs investing in custom development on agile technologies is an
important decision
With the emergence
of the open source software movement and also the ability to easily
share software, learning, approaches using a number of internet based
platforms , it is no surprise that a lot of startups see open source
as an easy way to bootstrap their product development. Over the
years, open source software is becoming the norm for driving product
development and data infrastructure creation within almost all tech
and digital industry leaders.
With the Cambrian
explosion of product development in the data space, a lot of the
products being released are tools or building blocks that then allow
efficiencies around data processing and data pipeline. So an
organization that needs to harness and use BigData for its day to day
needs has this very important decision in front of them. Should they
be doing custom development on the generic open source technologies
and therefore allow their solutions to evolve along with the
underlying generic technology, or should they bring in third party
tools for important parts of their data processing? (This is a
variant of the classic Build vs Buy question, but has some nuances
because of the open source explosion.)
Each decision comes
with its pros and cons. Working with tools improves speed to market,
but then forces the buying organization to use a set of constraints
that a tool is likely to impose on them. Working on generic
technologies removes this dependency and allows for natural product
evolution, but this then comes at the cost of development time and
lower speed to market, potentially higher costs. And these are not
easy decisions. My specific observation here was around how my
organization has chosen to ingest data into its HDFS environment.
Should we be doing custom development using some of the open source
data ingestion frameworks such as Apache Flume or Storm, or should we
use a product that comes with a number of desirable features
out-of-the-box like Informatica? These are not easy decisions and I
think the whole Build vs Buy decision on BigData needs its own
blogpost.
3. Open source is
here to stay
I think I might have
said this before but open source is here to stay and going through a
Cambrian explosion. Enough said on that!
4. Innovation to new
and dynamic technologies needs to be multi-threaded
As relative late
adopters on to the BigData platform, my organization has been
following a linear and established path to BigData adoption. The goal
here has been being able to get to low-hanging fruit with BigData
here around cost savings – by taking spend away from investing in
RDBMS platforms. It is a perfectly legitimate goal to have and I
think we are going about this goal in a very structured manner. But
in a world of fast evolving technologies, this focus creates the risk
that we end up having a blind spot within the overall ecosystem
around other use-cases of the technology. In our case, real-time data
use-cases and streaming analytics is a big blind-spot from my vantage
point. The risk here is that by the time we achieve the low-hanging
fruit by being systematic and focused, we end up losing a lot of
ground in other areas and are similarly behind when the next
technology wave happens.
So my view here is
that we need to be multi-threaded in our technology adoption. We need
to have specific goals and be focused on them to make these new
technologies mainstream – but at the same time, we need to be aware
of other applications of the technology and make sure there are
investments in place to build our capabilities on these areas which
are not immediate focus. Also, to have a SWAT team working on even
newer technologies and ideas that are likely to become mainstream 12
months from now.
Just a smattering of
my immediate thoughts from Strata. Like I promised, not very
organized but did want to share some of my unvarnished opinions.
5 comments:
Big data is now taking the guesswork out of discerning which individuals are the best targets for a particular product. To know more about SAP, Visit Big data training in chennai
I am really impressed along with your writing skills and also with the format on your blog.
Bigdata Analytics
Nice blog Thank you.
analytics companies in bangalore
top analytics companies in india
google analytics service provider
apple watch 6 titanium
Apple Watch cerakote titanium 6 Titanium titanium granite countertops is an titanium canteen iPhone Watch 6 based on a titanium model made of glass. This titanium hair straightener watch features a titan metal three sided, four sided
Excellent post! If you're looking to enhance your online presence and achieve your business goals, partnering with a digital marketing agency is the way to go. A professional agency can provide a comprehensive range of services including SEO, content marketing, social media management, and PPC advertising. Their expertise can help you reach your target audience more effectively, increase your brand visibility, and drive higher traffic to your website. Don't miss out on the benefits of collaborating with a top-tier digital marketing agency to take your business to new heights!
Post a Comment