Trends and Predictions 2020: Big Data

Stephen DeAngelis

January 16, 2020

For many companies, dealing with the implementation of the California Consumer Privacy Act (CCPA), formally known as AB 375, will be the biggest big data story of the year.[1] The CCPA is a big deal because many companies have dealings in California and more and more companies leverage big data to target consumers. Looking back, Jelani Harper writes, “Other than the resurgence of various Artificial Intelligence dimensions, the single most meaningful development in the big data space in the past several years is the burgeoning distribution of data assets.”[2] He continues, “Whereas once those assets were safely confined within the enterprise, the confluence of mobile technologies, the cloud, the Internet of Things, edge computing, containerization, social media, and big data itself has shifted the onus of data management to external, decentralized sources.” That means companies now must worry whether or not data managed by others can bite them. Nevertheless, Harper explains companies will continue to leverage big data despite the risks. He writes, “Organizations can now get the diversity of data required for meaningful machine learning results. The overhead of operating in hybrid, multi-cloud environments is less costly. The very worth of big data has increased with novel opportunities to comprehensively analyze business problems from an array of sources not previously available.” So let’s look what the future of big data might look like.

Big data trends

Kenneth Maxon notes, “One of the most evolving technologies in the digital age is Big Data technologies. … Big Data is not just simply a term. It is associated with other technologies such as machine learning, artificial intelligence, blockchain, Internet of Things, augmented reality and a whole lot more.”[3] In fact, data is now considered the world’s most valuable resource. According to Maxon, you can’t appreciate the importance of big data without understanding the technologies big data drives. He lists the ten most important technologies. They are:

1. Data Lakes. “Data Lakes [are] huge data repositories that collect data from different sources and [store data in their] natural state.”

2. Hadoop Ecosystem. “Apache Hadoop may not be as popular as it was before but Big Data isn’t complete without mentioning this technology. It is an open-source framework for distributed processing of big data sets.”

3. NoSQL Databases. “NoSQL databases store unstructured data and provide fast performance. This means that it offers flexibility while handling a wide variety of datatypes at large volumes.”

4. Apache Spark. “Apache Spark is an engine for processing large amounts of data within Hadoop and is 100x faster compared to MapReduce, Hadoop’s standard engine.”

5. Artificial Intelligence. “In a lot of ways, Big Data has played a role in the advancement of AI through its two subset of disciplines; machine learning and deep learning.”

6. Blockchain. “Blockchain is used mainly in functions such as payment, escrow and can speed up transactions, reduce fraud and increase financial security. … An excellent choice for Big Data applications in sensitive industries because it is highly secure.”

7. In-memory Databases. “If a Big Data analytics solution can process data in the RAM, rather than the data stored on the hard drive, it can improve dramatically. … Many of the leading software enterprises are adopting this technology and will surely be a big hit this 2020.”

8. Predictive Analytics. “A subset of Big Data Analytics, Predictive Analytics attempts to forecast future events or behavior through historical data. It works through data mining, modeling, and machine learning techniques to predict what will happen next.”

9. R. “R is an open-source … programming language and software environment designed for working with statistics. … R has become one of the most popular languages in the world.”

10. Prescriptive Analytics. “Prescriptive analytics offers advice to companies about what they should do in order to achieve a desired result.”

Sudheesh Nair, CEO of ThoughtSpot, notes, “Every organization in the world is becoming a Big Data company. It’s a requirement to operate in today’s business landscape.”[4]

Big data predictions

Prediction 1. Hadoop will lose its cachet. Nair explains, “Data has become so voluminous, and the need for agility with this data so great … organizations are either building their own data lakes or warehouses, or going directly to the cloud. As that trend accelerates in 2020, we’ll see Hadoop continue to decline.” Looking back on 2019, Alex Woodie (@alex_woodie), writes, “There’s no denying that Hadoop had a rough year in 2019. … But Hadoop compute, in the form of Apache Spark, lives strong.”[5]

Prediction 2. Data will become more democratized. Peter Bailis (@pbailis), Chief Executive Officer at Sisu, predicts, “Everyone in an organization will start acting more like a data analyst on a daily basis, and we’ll see new skills and tools focused on specific use cases emerge.”[6] Ryohei Fujimaki, Founder and CEO of dotData, adds, “Enterprises are focusing on repurposing existing resources as ‘citizen’ data scientists. The rise of [automated machine learning] and data science automation can unlock data science to a broader user base and allow the practice to scale. By empowering citizen data scientists allowing them to execute standard use cases, skilled data scientists can focus on high-impact, technically-challenging projects to produce higher values.”[7]

Prediction 3. Data warehouses will lose out to data lakes. Tomer Shiran (@tshiran), co-founder and CEO of Dremio, predicts, “Given the tremendous cost and complexity associated with traditional on-premise data warehouses, … savvy enterprises [are] moving directly to a next-generation architecture built around cloud data lakes.”[8]

Prediction 4. Data modeling will increase in importance. Harper predicts, “Data modeling for 2020 and beyond will increasingly become characterized by data shapes, digital twins, ensemble modeling, ongoing model management, and model validation measures to satisfy what is quickly becoming the most demanding task in the data sphere. … Data modeling has advanced to reflect the dynamic, fluid forms of data everywhere. It encompasses versatile methods to adapt to modern demands of variegated schema, real-time streaming data, predictive models, and pressing regulatory concerns. Organizations are tasked with becoming well versed in these modeling techniques or, quite possibly, falling prey to competitors who have.”[9]

Concluding thoughts

Bailis believes, “In 2020, we’ll see more context within data, creating actionable data for a variety of departments within a company, requiring tech savvy employees and tools. Ultimately, the platform that can enable more decision making processes will win the market and lead to new ways to answer business questions.” The importance of big data to business motivated Yossi Sheffi (@YossiSheffi), the Elisha Gray II Professor of Engineering Systems at MIT, to assert, “The well-worn adage that a company’s most valuable asset is its people needs an update. Today, it’s not people but data that tops the asset value list for companies.”[10]

Footnotes
[1] Stephen DeAngelis, “California Consumer Privacy Act is Now in Force,” Enterra Insights, 2 January 2020.
[2] Jelani Harper, “2020 Trends in Big Data: The Integration Agenda,” insideBIGDATA, 24 October 2019.
[3] Kenneth Maxon, “Top 10 Big Data Technologies You Must Know In 2020,” Robots.net, 25 December 2019.
[4] Alex Woodie, “Big Data Predictions: What 2020 Will Bring,” Datanami, 23 December 2019.
[5] Ibid.
[6] Peter Bailis, “4 top trends that will impact how organizations use analytics,” Information Management, 17 December 2019.
[7] Ryohei Fujimaki, “Four Big Factors Shaping the Future of Data Science,” insideBIGDATA, 26 October 2019.
[8] Tomer Shiran, “4 top trends for big data analytics in 2020,” Information Management, 24 December 2019.
[9] Jelani Harper, “2020 Trends in Data Modeling: Unparalleled Advancement,” insideBIGDATA, 29 November 2019.
[10] Yossi Sheffi, “What is a Company’s Most Valuable Asset? Not People,” Supply Chain @ MIT, 20 December 2018.