Can You have too much Data?

Stephen DeAngelis

May 30, 2019

Most people are aware the World Economic Forum declared data a valuable resource like gold. If the analogy is accurate, can an enterprise have too much data? Few people would argue an enterprise can have too much gold. Andrew White (@mdmcentral), a Vice President and analyst at Gartner, asks, “Are you amassing your data war chest?”[1] Historically, a war chest is a colloquialism for amassing as many resources as possible to fight a war, make acquisitions, or fend off hard times. Again, the implication of having a “data war chest” is that an enterprise can’t have too much data. Eric D. Brown (@EricDBrown), Chief Information Officer of Sundial Capital Research, believes such thinking is flawed. He explains, “When people first start looking at the amount of data they’ve collected, it’s not unusual to think, ‘Hey! We should use *all* the data!’ While I applaud the enthusiasm, just adding more data isn’t always the answer (rarely is it the answer at all).”[2] In other words, a company would be better off reading the story of “Goldilocks and Three Bears” (in which the main character looks for things that are “just right”) rather than the story of King Midas (in which the main character’s greedy obsession for gold ends up very badly).

Too much data can be bad

We live in a world in which the predominant philosophy seems to be, “More is better.” Many companies apply this philosophy to big data. Douglas Fair, COO of InfinityQS International, writes, “Compared with 30 years ago, it seems as though data access is unlimited.”[3] Because gathering data has become relatively easy, our natural tendency, Fair asserts, is to collect it. “It’s human nature,” he writes. “If there’s a way to do something, generally speaking, we do it. … The shackles are off. Collect all you want, anytime you want, and in any way you want. Collect. Collect. Collect.” He then asks an important question, “Should you collect the data?” Brown asserts business executives should ponder that question carefully. He notes, “When dealing with most things in life, people are usually pretty good at determining what is plausible by using their common sense and experience to gauge whether something they hear/read is correct or not. This approach works well until it doesn’t. My experience has shown me that it doesn’t work that well when it comes to big data, AI, and machine learning. Most people’s intuition is wrong (e.g., more data must be better).” The success of companies like Google, Facebook, and Amazon demonstrate having lots of data can be profitable. However, data breaches and privacy abuses have resulted in more stringent privacy regulations (e.g., the European Union’s General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA) of 2018). So, when it comes to collecting data there is a lot more to think about than how much can be collected and stored.

Companies adopting the “more is better” philosophy can get into trouble. Lawyers Judy Selby (@judy_selby) and Melissa Kosack explain that failing to handle data appropriately can result in significant costs.[4] They explain, “According to recent research, companies must recognize this new reality in which corporate reputations may be negatively impacted by decisions they make concerning data within their control. As companies are incurring significant costs to capitalize on the enormous amounts of data — so-called Big Data — constantly generated by the Internet of Things (IoT), social media platforms, websites, and other sources, they must appreciate that their use, misuse and governance of data can have a direct impact on their goodwill and ultimate valuation.” Collecting, storing, and analyzing data isn’t cost free. Add to those costs, reputational costs and potential fines for privacy breaches and the downside of big data quickly becomes apparent. Jennifer Marangos writes, “That’s why the question of whether having too much customer data is a liability may simply need to be reframed, according to at least one big data expert. It’s not whether having too much customer data is a liability, but rather whether the good that can be done with all of that data outweighs the bad, said Michael Imerman, the Theodore A. Lauer Distinguished Professor of Investments at Lehigh University, Bethlehem.”[5] Weighing the potential benefits and drawbacks of collecting and storing all types of big data, not just personal data, is a good idea.

Getting data just right

Fair suggests, “Before becoming a data glutton, you need to ask some simple, yet challenging questions.” Those questions are:

1. Why do we need to gather this data?
a. Is this a short-term data collection necessary to solve a problem?
b. Is this data required to fulfill a long-term strategic imperative?

2. How will the data be used after it is collected?

3. Who will be evaluating that data?

4. How will the data be evaluated?

5. What is a reasonable, rational amount of data to collect?

6. How frequently do we need to collect the data?

7. Do we really need to collect data every few milliseconds?
a. If so, what purpose would it serve?
b. How will the data be used?

Answering those questions will get you closer to getting the data just right. Even when you’ve figured out the right kind of data to collect, store, and analyze, you need to ensure the data can provide the best possible insights. Vikas Bhatt, CEO of OnlyB2B, explains, “Since real life data is dirty, it gets costly and, therefore, the significance of data quality management in business is highlighted. Data cleansing or scrubbing or appending is the procedure of correcting or removing inaccurate and corrupt data. This process is crucial and emphasized because wrong data can drive a business to wrong decisions, conclusions, and poor analysis, especially if the huge quantities of big data are into the picture. There are businesses who have lost a huge amount of money due to the big bad data.”[6]

Concluding thoughts

Companies have proven time and again that data is important — even critical — for success in the digital age. In fact, most analysts believe companies failing to transform into digital enterprises (i.e., organizations that can leverage big data) risk failure. However, getting data “just right” is important because too much data can result in unexpected, negative consequences. Another important point to note is having the right data without the right analytics is meaningless. As Fair observes, “If it just sits in a database, data is worthless.” There are a lot of pieces needing to be put in place to make data valuable for your enterprise; but, it all begins with getting the data right in the first place.

Footnotes
[1] Andrew White, “Are You Amassing Your Data War Chest?Gartner Blog, 15 April 2019.
[2] Eric D. Brown, “4 big data myths, busted,” The Enterprisers Project, 23 April 2019.
[3] Douglas Fair, “Drowning in Data: Consequences of Having Too Much of a Good Thing,” IndustryWeek, 22 August 2018
[4] Judy Selby and Melissa Kosack, “How Much is that Big Data Worth? — Big Data Decisions Impact Business Valuations,” Datafloq, 29 June 2016.
[5] Jennifer Marangos, “Big Data, Big Risk?Lehigh Valley Business, 15 January 2018.
[6] Vikas Bhatt, “The Significance of Data Cleansing in Big Data,” AIthority, 18 February 2019.