The Age of Big Data: Is It Coming or has It Arrived?

Stephen DeAngelis

March 13, 2012

Last November, Jessica Twentyman wrote, “‘Big data’ is the ‘next frontier for innovation, competition and productivity’, according to McKinsey.” [“Big data is ‘the next frontier’,” Financial Times, 14 November 2012] The implication of the McKinsey study to which Twentyman was referring was that the era of Big Data is coming but has not yet arrived. As you’ll read below, others believe it is already here. Twentyman continued:

“[The] report by the management consultancy argued that the successful companies of tomorrow, whether they are market leaders or feisty start-ups, will be those that are able to capture, analyse and draw meaningful insight from large stores of corporate and customer information.”

Steve Lohr, a technology reporter for The New York Times, believes the age of Big Data is more than a glow rising over the horizon. He believes the dawn has broken and the age of Big Data is already here. [“The Age of Big Data,” 11 February 2012] He writes:

“Welcome to the Age of Big Data. The new megarich of Silicon Valley, first at Google and now Facebook, are masters at harnessing the data of the Web — online searches, posts and messages — with Internet advertising. At the World Economic Forum last month in Davos, Switzerland, Big Data was a marquee topic. A report by the forum, ‘Big Data, Big Impact,’ declared data a new class of economic asset, like currency or gold.”

Lohr notes that good data consultants are being snapped up by IT-related companies. “They help businesses make sense of an explosion of data — Web traffic and social network comments, as well as software and sensors that monitor shipments, suppliers and customers — to guide decisions, trim costs and lift sales.” Lohr continues:

“A report last year by the McKinsey Global Institute, the research arm of the consulting firm, projected that the United States needs 140,000 to 190,000 more workers with ‘deep analytical’ expertise and 1.5 million more data-literate managers, whether retrained or hired. The impact of data abundance extends well beyond business. Justin Grimmer, for example, is one of the new breed of political scientists. A 28-year-old assistant professor at Stanford, he combined math with political science in his undergraduate and graduate studies, seeing ‘an opportunity because the discipline is becoming increasingly data-intensive.’ His research involves the computer-automated analysis of blog postings, Congressional speeches and press releases, and news articles, looking for insights into how political ideas spread.”

On web sites like the Washington Post, it is common now to see statistics and analysis of data being gathered about presidential candidates from social media sources like Facebook and Twitter. Lohr says that business and politics aren’t the only areas interested in Big Data. He continues:

“The story is similar in fields as varied as science and sports, advertising and public health — a drift toward data-driven discovery and decision-making. ‘It’s a revolution,’ says Gary King, director of Harvard’s Institute for Quantitative Social Science. ‘We’re really just getting under way. But the march of quantification, made possible by enormous new sources of data, will sweep through academia, business and government. There is no area that is going to be untouched.'”

For some people, that language sounds ominous. “Privacy advocates take a dim view,” Lohr writes, “warning that Big Data is Big Brother, in corporate clothing.” Privacy issues are probably the greatest concern being raised. There is an upside to Big Data analysis, however, that promises to help us understand the world better and to get the most from the limited resources on the planet. Lohr continues:

“What is Big Data? A meme and a marketing term, for sure, but also shorthand for advancing trends in technology that open the door to a new approach to understanding the world and making decisions. There is a lot more data, all the time, growing at 50 percent a year, or more than doubling every two years, estimates IDC, a technology research firm. It’s not just more streams of data, but entirely new ones. For example, there are now countless digital sensors worldwide in industrial equipment, automobiles, electrical meters and shipping crates. They can measure and communicate location, movement, vibration, temperature, humidity, even chemical changes in the air. Link these communicating sensors to computing intelligence and you see the rise of what is called the Internet of Things or the Industrial Internet. Improved access to information is also fueling the Big Data trend. For example, government data — employment figures and other information — has been steadily migrating onto the Web. In 2009, Washington opened the data doors further by starting Data.gov, a Web site that makes all kinds of government data accessible to the public. Data is not only becoming more available but also more understandable to computers. Most of the Big Data surge is data in the wild — unruly stuff like words, images and video on the Web and those streams of sensor data. It is called unstructured data and is not typically grist for traditional databases.”

This unstructured data has been a real challenge in the past. It’s one of the reasons that the age of Big Data has always been just over the horizon. As Lohr reports, however, “the computer tools for gleaning knowledge and insights from the Internet era’s vast trove of unstructured data are fast gaining ground.” Those tools are finally what ushered in the era of Big Data. Lohr continues:

“At the forefront are the rapidly advancing techniques of artificial intelligence like natural-language processing, pattern recognition and machine learning. Those artificial-intelligence technologies can be applied in many fields. For example, Google’s search and ad business and its experimental robot cars, which have navigated thousands of miles of California roads, both use a bundle of artificial-intelligence tricks. Both are daunting Big Data challenges, parsing vast quantities of data and making decisions instantaneously. The wealth of new data, in turn, accelerates advances in computing — a virtuous circle of Big Data. Machine-learning algorithms, for example, learn on data, and the more data, the more the machines learn.”

Since Enterra Solutions offers Big Data solutions and uses artificial intelligence techniques, including natural-language processing, pattern recognition and machine learning, I admit to having a favorable bias towards Big Data analysis and the value it can bring to the world. I’m not alone however; Lohr continues:

“To grasp the potential impact of Big Data, look to the microscope, says Erik Brynjolfsson, an economist at Massachusetts Institute of Technology’s Sloan School of Management. The microscope, invented four centuries ago, allowed people to see and measure things as never before — at the cellular level. It was a revolution in measurement. Data measurement, Professor Brynjolfsson explains, is the modern equivalent of the microscope. Google searches, Facebook posts and Twitter messages, for example, make it possible to measure behavior and sentiment in fine detail and as it happens. In business, economics and other fields, Professor Brynjolfsson says, decisions will increasingly be based on data and analysis rather than on experience and intuition. ‘We can start being a lot more scientific,’ he observes.”

With the amount of data being generated each day, Brynjolfsson might have been more accurate to say that today’s data measurement tools are more like the electron microscope than its lens-based predecessor. Lohr insists that “there is plenty of anecdotal evidence of the payoff from data-first thinking.” He points, for example, to the famous book Moneyball by Michael Lewis and the movie it spawned by the same name starring Brad Pitt. He continues:

“Retailers, like Walmart and Kohl’s, analyze sales, pricing and economic, demographic and weather data to tailor product selections at particular stores and determine the timing of price markdowns. Shipping companies, like U.P.S., mine data on truck delivery times and traffic patterns to fine-tune routing. … Police departments across the country, led by New York’s, use computerized mapping and analysis of variables like historical arrest patterns, paydays, sporting events, rainfall and holidays to try to predict likely crime ‘hot spots’ and deploy officers there in advance. Research by Professor Brynjolfsson and two other colleagues, published last year, suggests that data-guided management is spreading across corporate America and starting to pay off. They studied 179 large companies and found that those adopting ‘data-driven decision making’ achieved productivity gains that were 5 percent to 6 percent higher than other factors could explain.”

For companies like mine, statistics like those are great to read about and pass along to potential clients! The potential uses for Big Data analysis are limitless. Lohr explains:

“The predictive power of Big Data is being explored — and shows promise — in fields like public health, economic development and economic forecasting. Researchers have found a spike in Google search requests for terms like ‘flu symptoms’ and ‘flu treatments’ a couple of weeks before there is an increase in flu patients coming to hospital emergency rooms in a region (and emergency room reports usually lag behind visits by two weeks or so). Global Pulse, a new initiative by the United Nations, wants to leverage Big Data for global development. The group will conduct so-called sentiment analysis of messages in social networks and text messages — using natural-language deciphering software — to help predict job losses, spending reductions or disease outbreaks in a given region. The goal is to use digital early-warning signals to guide assistance programs in advance to, for example, prevent a region from slipping back into poverty. In economic forecasting, research has shown that trends in increasing or decreasing volumes of housing-related search queries in Google are a more accurate predictor of house sales in the next quarter than the forecasts of real estate economists. The Federal Reserve, among others, has taken notice. In July, the National Bureau of Economic Research is holding a workshop on ‘Opportunities in Big Data’ and its implications for the economics profession.”

I suspect that a lot of people would like to see some advancement in the field of economics. Laurence J. Peter once stated, “An economist is an expert who will know tomorrow why the things he predicted yesterday didn’t happen today.” Big Data analysis may improve those forecasts. For people out of work, Big Data analysis has demonstrated that people with whom you have “weak ties” could play a big role in helping you land your next job. Lohr explains:

“Today, social-network research involves mining huge digital data sets of collective behavior online. Among the findings: people whom you know but don’t communicate with often — ‘weak ties,’ in sociology — are the best sources of tips about job openings. They travel in slightly different social worlds than close friends, so they see opportunities you and your best friends do not.”

Before painting too rosy of a picture about Big Data, Lohr add a few caveats. He continues:

“Big Data has its perils, to be sure. With huge data sets and fine-grained measurement, statisticians and computer scientists note, there is increased risk of ‘false discoveries.’ The trouble with seeking a meaningful needle in massive haystacks of data, says Trevor Hastie, a statistics professor at Stanford, is that ‘many bits of straw look like needles.’ Big Data also supplies more raw material for statistical shenanigans and biased fact-finding excursions. It offers a high-tech twist on an old trick: I know the facts, now let’s find ’em. That is, says Rebecca Goldin, a mathematician at George Mason University, ‘one of the most pernicious uses of data.'”

Those are good cautions to keep in mind. When a company is looking for a provider of Big Data analysis they should look for one that doesn’t come with pre-conceived notions about what answers should or will be found. The provider should have an open mind and a curious disposition. The best Big Data analysis is about discovery and learning and not about bracing up old beliefs. Lohr continues:

“Data is tamed and understood using computer and mathematical models. These models, like metaphors in literature, are explanatory simplifications. They are useful for understanding, but they have their limits. A model might spot a correlation and draw a statistical inference that is unfair or discriminatory, based on online searches, affecting the products, bank loans and health insurance a person is offered, privacy advocates warn. Despite the caveats, there seems to be no turning back. Data is in the driver’s seat. It’s there, it’s useful and it’s valuable, even hip.”

Einstein might have been the last mathematician that made the profession appear hip. The age of Big Data is changing that. Today mathematicians are in demand. As Andrew Gelman, a statistician and political scientist at Columbia University, told Lohr, “There is this idea that numbers and statistics are interesting and fun. It’s cool now.” I hope that message is heard and heeded by up and coming generations that seem to be abandoning science and math as career choices. If we can convince them that data is as valuable as gold, perhaps we can stir a “Big Data Rush” into Silicon Valley or Newtown, PA, where my company is headquartered.