Big Data Analytics: September 2014

Tuesday, September 30, 2014

Real-time analytics infrastructures

As the BigData (Hadoop) explosion has taken hold, architectures that provide analytic access to even larger data for even deeper insights have started becoming the norm in many top data science driven organizations. One of them is the approach used in LinkedIn - the LinkedIn analytic blog is typically rich with ideas on how to approach analytics.

What are some of the challenges that some of these architectures are trying to tackle?

- The need to organize data together from different data sources into one single entity. In most cases, this is typically an individual. In the case of LinkedIn, it is the professional member. For Facebook, one of its 1.2 billion subscribers. For a bank, one of its customers. While the actual integration of the data might sound trivial, the effort involved is highly manual. Especially in legacy organizations like banks that have different source systems (managed by different vendors) performing their billing, transactions processing, bill-pay, etc. the complexity of the data that comes into the bank in the form of raw files can be truly mind-boggling. Think multiple different data transfer formats (I recently came across the good old EBCDIC format), files with specific keys and identities relevant to that system, and so on. These files need to be converted into a common format that is readable by all internal applications, also organized around one internal key.

- Next, the need for the data to be value-enhanced in a consistent manner. Raw data is seldom useful to all users without any form of value-addition. This value addition could be something simple i.e. taking the relationship opening date and converting this into a length of relationship indicator. So, say the relationship opening date is 1/1/2003, the length of relationship is 11 years. Or it could be a more complex synthetic attribute that uses multiple raw data elements and combines them together. An example is credit card utilization, which is the balance divided by available credit limit. The problem with this kind of value enhancement is that different people could choose to do this in different ways. So creating multiple such synthetic attributes in the data ecosystem - which can be confusing to the user. Creating a data architecture which allows these kinds of synthetic attributes to be defined once and then used multiple times can be a useful solution to the problem I just described.

- The need to respond to queries to the data environment within an acceptable time interval. Also known as the service level or SLA that an application demands, any data product needs to meet business or user needs in terms of the number of concurrent users, query latency times. The raw HDFS infrastructure was always designed for batch and not for any real-time access patterns. Enabling these patterns requires the data to be pre-organized and processed through some kind of batch approach - so as to make it consumption ready. That when combined with the need to maintain close to real-time data relevance, means that the architecture needs to use elements beyond just the basic Hadoop infrastructure.

These are just some of the reasons why BigData applications and implementations need to be pay special attention to the architecture and the choice of the different component systems.

Tuesday, September 23, 2014

A/B Testing - ensuring organizational readiness

In my previous post on the subject of A/B testing, I had talked about the need for operational and technical readiness assessment before one embarks on A/B testing. It is essential to make sure that data flows in the overall system are designed well enough to make sure user behavior can be tracked. Also the measurement system needs to be robust enough to not break when changes to the client (browser or mobile) are introduced. In reality, this is achieved by a combination of a robust platform as well as disciplined coding practices while introducing new content.

But equally important as operational/ technical readiness is organizational readiness to embrace Test and Learn. A few reasons why the organization might not be ready (despite mouthing all the right platitudes).

First, inability to recognize the need for unbiased testing in the "wild". A lot of digital product managers tend to treat usability studies, consumer research/ empathy interviews and A/B testing as somewhat interchangeable ideas. Each of these techniques have a distinct use and they need to complement each other. Specifically, A/B testing achieves the goal of evaluating a product or a feature in an environment that is most like what a consumer is likely to experience. There is no one-way mirror, no interviewer putting words in your mouth - it is all about how the product works in the context of people's lives and whether it proves itself out to be useful.

To remedy this, we have had to undertake extensive education sessions with product managers and developers around the value of A/B testing and building testing capability into the product from the get-go. While for a lot of people deep in analytics tend to find testing and experimentation a natural way to go, this approach is not obvious to everyone.

Second, the reason why A/B testing and experimentation is not embraced as it needs to be is risk aversion. There are fears (sometimes justified) that doing something different from the norm is going to be perceived by customers as disruptive to the experience they are used to. Again, this is something that needs constant education. Also, instead of doing a 50/50 split, exposing the new experience only to a small number of visitors or users (running into several thousands but in all, less that a few percentage points of the total traffic a site would see) is the way to go.

Additionally, having a specific "testing" budget agreed upfront and ensuring transparency around how the budget is getting used is an excellent way to mitigate a lot of these mostly well-meaning but also mostly unnecessary concerns.

What do you think about organizational and technical readiness? How have you addressed it in your organization while getting A/B testing off the ground? Please share in the comments area.

Friday, September 12, 2014

A/B Testing - some recent lessons learned - First part out of many

We have been making a slow and steady journey towards A/B testing in my organization. No need to really explain the value or the need for A/B testing. Experimentation is quite simply the only way to understand causation from correlation - also the real only way to measure whether any of the product features we build really even matter.

In the past 12 months, we have had some important lessons about testing and importantly, the organizational readiness required before you go and buy an MVT tool and start running A/B tests at large. And these are:

1. Ensuring organizational and infrastructural readiness
2. Building a culture of testing and establishing proofs of concept
3. Continuous improvement from A/B testing at scale

In my opinion, the first and most important step is creating the baseline in terms of organizational and infrastructural readiness. Despite the best intentions of learning from testing, there can be a number of important reasons why testing just does not get off the ground.

A poor measurement framework is one such big reason. An online performance measurement solution such as Adobe SiteCatalyst is only as good as the attention to detail in implementation and the robustness of the implementation. In our case, though the overall implementation was useful in giving some good online behavior measurement, the attention to detail in ensuring every single action on the site was measurable was just not there. As a result, a few initial attempts at testing proved to be failures - meaning the test was not readable and needed to be abandoned. Not only was this wasted effort from testing. This was also a meaningful setback that re-inforced another belief in the organization, that testing is both risky and unnecessary for an organization that gets customer research and usability right. This brings me to the next part of readiness - which is organizational readiness. Which will be my next post.