Big Data or Smart Data – 99% of Companies
I think the current working definition of ‘Big Data’ is something like:
…….so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications.
That’s great – so far the only clear winners have been technology providers that specialize in dealing with larger volumes of data. Most companies don’t have that much data and if they did…nobody internally could really deal with it, let alone do something useful with it across business groups (say, IT and Marketing). What about info security when shipping 100′s of gigs of data?
Another Popular Definition
http://whatis.techtarget.com/definition/3Vs - An alternative definition identifies 3 V’s
Volume, Variety and Velocity — we addressed volume; velocity for our purposes can be tucked under volume. Variety, well…if you can access multiple data sources and blend them, some interesting integrations might just be possible. (image below courtesy of these guys)
A Practical Concept for Big Data?
How do you know what data blends make sense for your organization? According to Gartner, you need to ask ‘Chewy Questions‘.
Now, Oreo, Chips-A-Hoy and Nabisco surely have Chewy Questions, but how do other industries come-up with such questions? Questions of this variety tend to be:
- Open ended
- Forward looking
- Scenario based / e.g., tied to specific courses of action
- May integrate external data in a unique way, or internal data through a novel approach
Sound like a Harvard Business Review Brief? Sorry….here’s something that comes to mind.
“How do you increase average order value for repeat customers?” — well, lots of questions come to mind:
- Why do people buy, when they buy again? If it’s because you send them coupons, maybe stop sending coupons? Will they still buy? Maybe A/B test it? With what tools?
- Are you raising prices on average? You’re in control…
- Do they cross-shop categories between purchase #’s? (do the follow-up emails resonate with the predicted 2nd purchase category, if any trend is discernible)
- Are you a mutli-channel brand?
- Does some seasonal factors effect behavior?
- Have you ever asked who the people are shopping for?
- Does web traffic drive retail activity?
- Does your product wear out?
Let’s think of the different data sources required to address each question:
- Customers table, transactions table, promo or discount code table, descriptions of the promos and dates they ran along with some table relating transaction to traffic source. Some AB Testing Tool?
- transaction + item query, provided you have an accurate sense of revenue (net of discounts? gift cards?, etc…)
- Same tables as above, but you need a dimension dealing with purchase number on the customer level of detail
- Do you have a unified customer definition in your CRM? Can you merge your retail CID with your web CID?
- Can you correlate sales with meteorological phenomenon such as temperature or storm activity?
- Survey data tied to the CID?
- Store sales tied to web traffic data.
- Returns and complaints data. Call Center exit surveys, by product?
So, bottom line – you’ve got a decent SQL + Web data nightmare on your hands, for a reasonably simple analysis. Doable however….Maybe that constitutes variety…but certainly not volume as you can manage this with some combo of Google Analytics, Tableau and MySQL or even R. Dare I say you might even use a pivot table along the way. Nothing fancy, eh?
Smart Data Management Gives you the Edge
I’d bet if you’re reading this and you had to get the above analysis together by the end of next week (or else) you’d be a little nervous. Again, not fancy, but is your data still disparate and difficult to access? Then again, it might be as simple as: ‘customers buy more because they ran out…we give them discounts because they like it‘. If that’s an acceptable description of purchase behavior within your organization, more power to you!
One of Tableau’s unique features is automated ‘data blending‘ – connect to any 2 data sources and it’ll automatically search for common values to join on. They’re clearly addressing the variety issue (and to some degree the volume one as well).
Bottom line – for 99% of companies, focus on ease of access to the most critical data-sets and how you can blend obvious ones to answer (or at least guide) specific questions or optimization problems.
Closing on an Ambiguous Note