Data Quality is More Important than Data Size

Stephen DeAngelis

August 07, 2019

A phrase that often popped up during Cold War military conversations was, “Quantity has a quality all its own.” It’s a phrase that could very well have originated in the age of big data rather during the Cold War. The World Economic Forum declared data a valuable resource, like oil or gold; and, because data is valuable, one’s first thought is you can’t have too much of it. Businesses, however, are reassessing that assumption. Tech journalist Nathan Sykes (@nathansykestech) explains, “Believe it or not, there is such a thing as ‘good data’ and ‘bad data’ — especially when it comes to AI. To be more specific, just having data available isn’t enough: There’s a distinction worth making between ‘useful’ and ‘not-so-useful’ data.”[1] Melanie Chan, a Publishing Executive at Unleashed Software, bluntly states, “Bad data is bad business.”[2] If having to worry about whether data is good or bad wasn’t challenging enough, Alex Woodie (@alex_woodie) asserts there is another challenge facing big data projects: complexity. He explains, “For all the progress that companies are making on their big data projects, there’s one big hurdle holding them back: complexity. Because of the high level of technical complexity that big data tech entails and the lack of data science skills, companies are not achieving everything they’d like to with big data.”[3]

Good results start with good data

Part of the complexity associated with big data projects is the data itself. Data can be structured or unstructured. Common sense tells you unstructured data is more complex than structured data and unstructured data is becoming more prevalent. Timothy King (@BigData_Review) reports IDC predicts, “80 percent of worldwide data will be unstructured by 2025. For many large companies, its reached that critical mass already. Unstructured data creates a unique challenge for organizations wishing to use their information for analysis. It can’t easily be stored in a database, and it has attributes that make it a challenge to search for, edit and analyze, especially on the fly. Those factors (and there are many more) are part of the reason why this is such an important topic.”[4] Subject matter experts often write about all of the “Vs” of big data; namely, Volume (there is lots of it), Velocity (it is generated at an amazing pace), Variety (it comes in structured and unstructured forms), Veracity (not all data is accurate), Value (data is the new gold), Vulnerability (database breaches occur on a regular basis), and Virtue (the ethical use of big data). Big data complexity is enough to make your head spin. Get the data right and you’re less likely to go wrong.

Dr. Andrew Rut, chief executive and co-founder of MyMeds&Me, writes, “Despite the widespread benefits, the reality is that big data is not always good data and without good governance it can lead to inefficient spend and poor decision making.”[5] Chan explains, “When it comes to bad data there are three keywords to help you identify it:

  • Irrelevant: “Just because you have collected the data does not mean it is useful to your business.”
  • Duplicated: “A key cause of inaccuracies and a storage burden, duplicated data is at the root of inconsistencies and disparities.”
  • Decayed: “Data is out-of-date, obsolete and needs regular data management.”

Chan suggests there are a few indicators you should be aware of that reveal you are using bad data. They include:

  • There are multiple versions of reports on your servers and no two departments are working from the same information.
  • Month end numbers don’t add up and the finance team must manually sift through paperwork looking for human errors and data inaccuracies.
  • CRM information is incorrect or outdated and you have inconsistent formats used when entering data, such as client information and contact details.
  • Inventory control is out of control with constant stock-outs and inventory waste.
  • Barcodes don’t match numbers in the computer system.
  • Vendor data and pricing information is manually synched and not accurate across all systems.
  • Supplier management takes too long, impacting efficiencies and costs.
  • No understanding of who your suppliers are, and orders are sent to the wrong place.
  • More time is spent fixing problems than analyzing and using data to improve your inventory control and customer service.

According to Tendü Yoğurtçu, chief technology officer at Syncsort, organizations must make a concerted effort to improve data quality. He writes, “To improve their data resources, data output and strategic decision making, companies must make an ongoing commitment to data quality, and this begins by creating an overarching strategy put in place before developing projects. A strategy must examine compliance requirements and considerations, as different data quality measures are needed for different purposes and results.” Skepticism about data, asserts Yoğurtçu, is a good place to start. He explains, “Enterprises must be skeptical of data as it essentially determines how the AI will work and bias in the data may be inherent because of past customers, business practices and sales. Historical data used for training the model impacts how algorithms behave and new data used by that model impacts future decisions. Bad data in each fundamental stage can have a significant impact on the business insights driven from the predictive model or the automated actions taken by the system.” Sykes adds, “Modern commerce requires an almost ludicrous amount of data. If it doesn’t already, competitiveness in your industry will soon depend on your ability to mobilize higher technologies and help you derive meaning, intent, direction and insight from the data. … The development of artificial intelligence platforms that deliver meaningful and actionable insights in real-world conditions requires high-quality data.” It’s that plain and simple — it’s just not easy to do.

Concluding thoughts

There is a tremendous upside to getting the data right when analyzing it. Rut explains, “Across all industries, data that is collected and analyzed effectively can give businesses a deep understanding of their own organization and the markets they operate in as well as the ability to accurately predict their customers’ behavior. The opportunities and potential benefits are huge.” According to Yoğurtçu, companies ignore data quality at their own risk. He writes, “Poor data quality could cost organizations an average of $15 million per year in losses, according to Gartner. Organizations know data is a key factor to their success, and for enterprises reliant on this data to make strategic business decisions, bad data can have a direct impact on their bottom line.” Chan simply concludes, “Good data is good business.”

Footnotes
[1] Nathan Sykes, “What To Know About The Impact of Data Quality and Quantity In AI,” SmartData Collective, 17 November 2018.
[2] Melanie Chan, “Telltale Signs You Are Using Bad Data,” Unleashed Software, 1 July 2019.
[3] Alex Woodie, “Increased Complexity Is Dragging on Big Data,” Datanami, 11 September 2018.
[4] Timothy King, “80 Percent of Your Data Will Be Unstructured in Five Years,” Solutions Review, 28 March 2019.
[5] Andrew Rut, “Data quality – the key to ensuring big data is good data,” Enterprise Times, 19 March 2019.
[6] Tendü Yoğurtçu, “Strong data quality key to success with machine learning, AI or blockchain,” Information Management, 14 January 2019.