The Future of Big Data, Part 1

Stephen DeAngelis

January 30, 2013

Big Data is getting plenty of attention nowadays. Too much attention according to some pundits. In fact, some pundits believe that there is a Big Data bubble that is going to burst. Other pundits believe that Big Data is already descending into what Gartner calls the “trough of disillusionment.” Still other pundits believe that the term Big Data should be unceremoniously done away with. Among the latter group is Leeno Rao. She writes, “Let’s banish the term ‘big data.'” [“Why We Need To Kill ‘Big Data’,” TechCrunch, 5 January 2013] She explains why she feels as she does:

“Why have I grown to hate the words ‘big data’? Because I think the term itself is outdated, and consists of an overly general set of words that don’t reflect what is actually happening now with data. It’s no longer about big data, it’s about what you can do with the data. It’s about the apps that layer on top of data stored, and insights these apps can provide. And I’m not the only one who has tired of the buzzword. I’ve talked to a number of investors, data experts and entrepreneurs who feel the same way.”

I agree with Rao that what’s really important is what you can do with the data. Unanalyzed data is as useless as an unread book sitting on a library shelf. Rao reports that the term “big data” was first used by “Francis Diebold of the University of Pennsylvania, who in July 2000 wrote about the term in relation to financial modeling.” She believes that a decade is long enough for a term to be used, abused, and retired. The reason that the term still has legs, however, is that the data contains the gold and the amount of data that must be mined to find the gold is getting bigger every day. Data is the sine qua non of everything that follows. Rao doesn’t disagree that the data is both big and important. In fact, she writes that it is so important that it “is the key to most product innovation.” As a result, she asserts, that every company that uses data is a “big data” company, which “doesn’t say much about the company at all.” She continues:

“According to IBM, big data spans four dimensions: Volume, Velocity, Variety, and Veracity. Nowadays, in the worlds of social networking, e-commerce, and even enterprise data storage, these factors apply across so many sectors. Large data sets are the norm. Big data doesn’t really mean much when there are so many different ways that we are sifting through and using these massive amounts of data. That’s not to under-estimate the importance of innovation in cleaning, analyzing and sorting through massive amounts of data. In fact, the future of many industries, including e-commerce and advertising, rests on being able to make sense of the data.”

Rao is looking for a new way to describe what is being done with large data sets (she writes., “let’s figure out a different way to describe startups that are dealing with large quantities of data); but, the fact remains that today’s data sets are large and that is why the simple descriptor “big” is likely to remain.

Svetlana Sicular believes that there has been so much hype about big data that it “is at the peak of inflated expectations.” The only way for those expectations to go is down into Gartner’s “trough of disillusionment.” [“Big Data is Falling into the Trough of Disillusionment,” Gartner, 22 January 2013] If you are not familiar with the Gartner Hype Cycle, read my post entitled Overcoming the Hype: Making Your Supply Chain a Strategic Weapon. Rather than being discouraged about the future of big data, Sicular believes that disillusionment with the subject means that “big data technology is maturing.” Like Rao, Sicular understands that the important thing is learning how to unlock the insights that are contained in large data sets. She writes:

“Framing a right question to express a game-changing idea is extremely challenging: first, selecting a question from multiple candidates; second, breaking it down to many sub-questions; and, third, answering even one of them reliably. It is hard.
Formulating a right question is always hard, but with big data, it is an order of magnitude harder, because you are blazing the trail (not grazing on the green field).”

At Enterra Solutions we use Artificial Intelligence to help us frame these questions. Our Sense, Think/Learn, Act™ system powers a Hypothesis Engine™ that can propose and explore interesting potential relationships it discovers on its own, and test them much more rapidly than humans can potentially iterate. Our belief is that the current reliance on one-by-one human attempts to question an exponentially growing space of data is the main cause of this disillusionment. These kinds of technologies will help big data climb out of the trough of disillusionment as they continue to mature. In the meantime, Sicular reports, “According to the Gartner Hype Cycle, the next stop for big data is negative press.” Not everyone agrees with Sicular that big data is headed into the trough of disillusionment. In fact, Patrick Campbell believes big hype is moving from “fad to favor in 2013.” [“Big Data Matters—CIOs Taking Charge!” Enterprise Tech Central, 10 January 2013] Campbell cites a statement by Thomas H. Davenport, a Visiting Professor at Harvard Business School and a Senior Advisor to Deloitte Analytics, that he found enlightening.

“When SAP generates more money from [Business Intelligence] BI and analytics than from its transactional suite, a major transition has taken place. When IBM has spent close to $20 billion on analytics-related acquisitions, it’s a permanently changed ball game.”

Campbell agrees with the other analysts cited above that what really matters is what you do with your data. “The price of Big Data and BI analytics—the ROI,” he concludes, “all depends on how well you implement your strategies and have access to the tools appropriate for yourBig Data.‘” Ann Grackin, an analyst with ChainLink Research, notes that it is not surprising that big data is capturing a lot of headlines given the fact that so much data is being collected every second of every day. “The problem,” she writes, “is that accumulating all this data takes space. And analyzing it takes software. … The theory is that there are things to learn there — about customers, about markets, about innovation — that can mean bigger opportunities for us.” [“Big Data,” ChainLink Research, 17 April 2012] When considering how to deal with big data one CIO told Grackin:

“What I care about is source, size, security and sense — that is making sense of it, or analytics. Just because there is data all over the place, I am not sure of where it comes from and if it tells us anything useful about our customers. And size? That is how much money I need in the budget to deal with all the databases the business users want. And security. I don’t want users downloading stuff with malware. My main issue with all these is what’s the point? Does all this data matter to us?”

Whether you prefer IBM’s four “Vs” (Volume, Velocity, Variety, and Veracity) or the CIO’s four “Ss” (Source, Size, Security and Sense), the goal is to make sense of large data sets while ensuring that the information being used is credible. It’s important because, as Grackin writes, “The data seems to be piling up.”

A database analyst named Tao Lin explained to Grackin, “What matters to users is a small fraction of that data, which is relevant to only him, or her.” In some cases, decision makers are only interested in being alerted when something goes wrong. “Some call this exception management,” writes Grackin, “but that is really what we are looking for.” I agree that management by exception is important; but, it is only one use case for big data analytics. Edward Tufte, whom Grackin calls “the master of envisioning and displaying quantitative data,” agrees with Tao that data relevancy is essential for obtaining useful insights. He told Grackin, “People are chasing huge databases, but there is truly only one bit that might be important to know, track, and chart.” In other words, they agree with Rao that what is important is what you can do with the data (i.e., data management) not the size of the data set. Unfortunately, size does matter and the larger the data set the more difficult it is to manage and analyze. Grackin continues:

“On the upside, we have noticed a very strong correlation between data management strategies, in general, and improved performance in business. Successful firms such as Amazon, Walmart, Dell, Apple, and many modest-sized organizations … embrace the value of information as a source of wealth, and not just for what it can tell us about the future. These corporations also find ways to make data actionable. They do not have aimless data collection schemes. Rather, their data collection is application driven, and, therefore, pertinent to managing their business processes. These firms have better cash positions and seem to have been in control of their supply chain due to adherence to data standards and communications technologies … which allows them to reduce their information cycle times. This, of course, contributes to the management of all that data. Conclusion: Big Data Is Big Business.”

That is probably the best five-word description of the future of big data. In the final two segments of this series, I’ll look at some thoughts on the future of big data offered by marketing technologist Scott Brinker. He agrees that there may be a big data bubble, but he claims that the future of big data is even bigger.