Big Data Success Requires Data, Talent, and Analytics

Stephen DeAngelis

May 19, 2014

“Success in the big data world is far more than what many of the software and hardware vendors would have you believe,” asserts Gary Drenik, CEO of Prosper Insights & Analytics. “Simply buying software does not qualify as being in the big data world.” [“Three Legs Of Big Data Stool Needed For Success,” Forbes, 23 April 2014] As his headline proclaims, Drenik believes that companies have to put in place three important foundational “legs” if big data projects are going to be successful. Those three legs coincide fairly well with traditional thinking about business success which requires people, processes, and resources. The three legs of the big data stool discussed by Drenik are data sourcing (resources), skilled humans (people), and analytics (processes). Concerning data sourcing, he writes:

“Data sourcing is not the same as just capturing Google data or Twitter feeds. Data sourcing requires knowledge of all the potential data streams available in order to build the most accurate and complete outcomes for decision making. A limited focus on the vast digital data streams generated online and through social networks will only provide an incomplete, and somewhat biased view. These digital sources, while valuable, are oftentimes third party data which require numerous unverified assumptions be me made before it can be relied upon. Large amounts of publicly available data provided directly from consumers, via government sources, survey data and transactional data (first party data) are often overlooked. These data sources offer invaluable insights that are much more specific, detailed and accurate than digital third party data. Another advantage is these data sets oftentimes cover years and can be trended, correlated and integrated with the new digital data to provide a more robust outcome. Eliminate either of these data sources, and you won’t have this leg of the stool.”

At the dawn of the information age, it didn’t take long for programmers to coin the adage “garbage in garbage out.” Bad data could skew results so badly that the results were worthless. In the era of big data, things have changed a bit. Because there is so much data available, bad data can’t skew Big Data results as badly as it can skew smaller data results. The more data the better. Sometimes, however, the very volume of data that can (or must) be analyzed appears so daunting that it paralyzes companies from action. Wally Powers, Director at West Monroe Partners, states, “Many organizations are finding that instead of a ‘Big Data’ problem, they have hundreds of ‘Little Data’ problems, or ‘molehills’ of unconnected data around the world.” [“You Don’t Have a Big Data Problem. You Have Hundreds of Little Data Problems,” SupplyChainBrain, 28 February 2013] Drenik and Powers would obviously agree that companies must understand what data they need to analyze and know where they can obtain it. Then, as Powers notes, the challenge becomes “shoveling together all those molehills into one cogent, structure of data.” Concerning the second leg of his big data stool — skilled human resources in both technology and marketing — Drenik writes:

“This is where the ball gets dropped by many who believe that licensing software and accessing or buying hardware is enough to succeed. The hype around big data has enticed many to think this way, only to find out that the software, hardware and even data flows are not turn-key solutions and require specific skills not present in most companies. Data scientists are not easy to find and typical marketing support staffs don’t always have the requisite background and experience for sourcing and analyzing the right data sets necessary to develop applications for solving business problems. The shortage of human capital presents a major obstacle for most organizations. Short change this leg and you only have access to data with little chance of capitalizing on new insights for business decisions and ultimately better insights is what big data is all about.”

Barry Graubart notes that a report from O’Reilly Strata called “Analyzing the Analyzers” asserts that there are four different flavors of data scientists that companies need. [“The Four Types of Data Scientists,” Content Matters, 1 July 2013] They are:

  • Data Businesspeople: while they have strong technical skills, data businesspeople are focused on using data to drive profits within an organization. They tend to be more senior and have an entrepreneurial focus.
  • Data Creatives tend to have substantial academic experience and excel at machine learning, big data and programming skills. Avid users of open source, Data Creatives tend to have broad-based skills, and can move from role to role more easily.
  • Data Developers tend to focus on the technical issues involved in managing data. They tend to be coders with strong programming and machine learning skills, with less of a focus on business or statistics.
  • Data Researchers typically come from the academic world and have deep backgrounds in statistics or the physical or social sciences. More than the other groups identified, Data Researchers frequently hold a PhD (more than half of those in the survey) and tend to have weaker sills in machine learning, programming or business.

As Drenik stated, data scientists are in high demand and, even if a company could afford to hire a slew of them, they may not be available. One option for addressing this challenge is capturing these different types of analytical expertise in the software itself and ensuring that it is accessible to non-technical personnel. In other words, non-technical personnel need to be able to use natural language queries and the system needs to be smart enough to figure out what it is users are asking. That is the approach we take for many of the solutions we provide at Enterra Solutions®. Concerning the last leg of his big data stool — mobilizing staff for analytics — Drenik writes:

“The last leg of the stool is mobilizing staff to utilize the data and create problem solving applications from analysis. These apps need to address identified business outcomes useful for all levels of the organization. Special attention needs to be paid to prevent the development of or overreliance on a new ‘IT-type’ department built on big data. Concentrating this power in a single area may slow outcomes/application development and cause staff to lose interest.”

I totally agree with Drenik that big data solutions must be democratized so that anyone needing to conduct analysis can do so. Concentrating big data analysis in a single department creates another organizational silo that must be breached to unlock information — the antithesis of why an organization should implement a big data project. Knowledge is power and power is not easily given away. That’s why big data solutions must be available to everyone in an organization who needs to conduct analysis or could benefit from actionable insights. Drenik concludes:

“With a turn-key solution, big data can be more quickly diffused throughout an organization, without having to concentrate access and usage in the hands of a few number crunchers who know how to access big data software. Real business solutions from big data need to be more strategic and derived from analysis of several data sets (orthogonal) to discover ‘unknown knowns’ necessary for success in making business decisions.”

He correct on most counts. “Turn-key” solutions rarely fit all of the unique circumstances found in a specific company. Solutions need to be tailored to circumstances but must be user-friendly if they are going to be useful to everyone who needs access to the data and its analytic results. After all, it’s the people who are going to ensure that a company’s resources and processes are translated into profits.