The "Big Data" Dialogues, Part 2

Stephen DeAngelis

September 20, 2011

In Part 1 of this series, I discussed a blog by supply chain analyst Lora Cecere that introduced us to the concept of Big Data Supply Chains. In that post, she defined Big Data Supply Chains as “value networks that extend from the customer’s customer to the supplier’s supplier that sense, shape and respond by listening, testing and learning with minimal latency.” In her second post on the subject [“Big Data Supply Chains: Boosting your Vocabulary,” Supply Chain Shaman, 18 August 2011], Cecere argues that companies will never build effective Big Data Supply Chain architectures if they don’t understand and embrace the concepts involved. “This includes,” she writes, “using new types of data and exploiting the increasing power of computing.”

To help us understand these underlying concepts, Cecere offers a vocabulary primer (see below). She believes that if we get the concepts right “we have the opportunity to change the plumbing.” She writes:

“The 1990s definition of integration is obsolete. I believe that an architecture that combines Enterprise Resource Planning (ERP), Advanced Planning Solutions (APS), Supply Chain Execution (SCE) Systems plus Business Intelligence (BI) is not sufficient. Why? Today, supply chain architectures respond. In most cases, it is not even an intelligent response. In fact, it is a DUMB, SLOW and often INACCURATE response. Current technologies either help us make better decisions through the use of optimization in planning or through improved visibility of enterprise transactions.”

Since Cecere’s post is about how “big data” solutions can be game changers, she obviously believes that data is important. But the data has to be handled correctly. In today’s business environment, she asserts, it is not being handled correctly. She explains:

“The data is dirty. The latency of information is long. Most companies have invested in enterprise technologies on a project basis. For most users, satisfaction is low. It should be no suprise that Excel is the number one planning application. Today’s technologies are primarily about supply. Deep solutions for demand are needed and [offer] an untapped opportunity. I believe that the future of supply chain technologies will define processes from the outside-in based on a deep and comprehensive solution for demand. Solutions that sense, shape and drive a profitable response bidirectionally from sell-side to buy-side markets.”

Another well-known supply chain analyst, Bob Ferrari, agrees with Cecere that Big Data Supply Chains represent the future and that the capabilities she discusses are “the foundation for predictive analytics or supply chain cockpit capabilities. They are in essence, the next frontier for enabling smarter and more informed decision making in S&OP and other enterprise management processes.” [“A Response to Big Data Supply Chains- Channel the Problem Into Desired Outcomes,” Supply Chain Matters, 24 August 2011]. There is a difference between knowledge and wisdom. Wisdom encompasses the ability to use knowledge effectively. Cecere and Ferrari make the case that new technologies are going to help supply chain professionals make wiser decisions. Cecere continues:

“If used correctly, I believe that the emerging technologies can allow us to drive a more intelligent response than we were able to achieve in the 1990s through optimization alone. I believe that through the concepts of Big Data Supply Chains that we can evoke the power of computing power to help our supply chain networks not just respond, but to dynamically sense, listen and learn. And, for the more advanced companies, I believe that they will fine tune their architectures to sense, listen, test, shape and drive continuous learning. It is the dawning of a more agile supply chain platform. Machine to machine learning can help our supply chains continuously learn.”

When Cecere talks about systems that dynamically “learn,” I suspect that she is referring to systems that incorporate some form of artificial intelligence (AI) within them. Although it’s important that human stakeholders in the supply chain continuously learn to do their job better, it’s even more important that underlying technology solutions learn and improve. System knowledge is permanent and shared whereas personal knowledge provides more limited gains company-wide. As companies move towards more holistic systems, the ability for the entire company to gain from learned system knowledge will also grow in importance. As Cecere wrote in her first blog, “In big data supply chains, focus on one system of record. Everyone has the moments when they show up at a business meeting only to argue about ‘whose report has the right data’. Solve this problem by writing once and using many times.” Cecere does address human learning. She writes:

“New approaches are emerging, if we can be open to the outcome. It is a time to learn, unlearn and relearn. The other day, I was interviewing a VP of Supply Chain about the future of supply chain technologies. I asked him, “If he had a magic wand, how would he describe what supply chain technologies of the future would look like?” His response, “Lora, I don’t know. I am frustrated. I just know that what we have does not work very well. Somehow, we need to be able to have a more agile sensing platform. Our current architectures are too rigid and the response is too late.” For reference, he works at a global company that is very advanced in supply chain thinking. They have 19 instances of SAP for ERP, and have gone through five different solutions of Advanced Planning (APS), and have superlative systems for order management, warehouse management, and transportation management. They were also early adopters of Multi-tier Inventory Optimization and Strategic Modeling technologies. If you buy my argument, it is time to retool and learn a new jargon.”

Before we get to Cecere’s “new jargon,” let’s return to Ferrari’s comments. In Cecere’s first blog, she talked about challenges facing companies that desire to transform current supply chains into Big Data Supply Chains. One of those challenges was change management. On that topic, Ferrari writes:

“In her commentary, Lora rightfully outlines some of the significant challenges involved towards achieving this concept. While these new approaches have the potential to allow the supply chain to ‘learn and predict’, they do present challenges for gaining executive level investment support, especially the CFO, not to mention the CIO who has to deal with the consequences of exploding data eating up IT infrastructure. … Without executive level leadership and sponsorship, many IT initiatives have little chance of success. Also, as many in our community know, previous multi-year ERP implementation that ended up consuming far more management time and costing too much money have left a sour taste for technology leapfrog. The principles of predictive analytics imply that various supply chain functional teams will need to have much deeper skills in data management, trading partner collaboration and analytics disciplines. It further implies that trading partners and customers will be comfortable with sharing of sensitive data. There are also strong implications for some organizational centralization of analytics teams. In our view, all of these factors point to fairly significant change management. Change does not occur until and unless organizational motivators for change exist. We continue to believe that success, for the business and for customers and suppliers, are always the best catalyst for change, especially in the current volatile and uncertain business environment.”

Ferrari is correct that “change does not occur until and unless organizational motivators for change exist.” In the past, I came across a formula for change management, but (unfortunately), I can’t remember its source. The first factor in the change management equation is dissatisfaction (D=Dissatisfaction with the current state). The second factor in the equation is vision (V=Clear vision for change). This is what Cecere is trying to help business leaders see. The third factor is process (P=Process for getting it done). Ferrari addresses the “P” in his next comment (see below). The final factor in the change management process is cost (C=Cost of change). Displayed in a mathematical formula, change management looks like this D x V x P > C. If any of those factors isn’t present (i.e., equals zero) then change won’t occur. If D=0 (i.e., if there is no felt need for change), then resistance to change will be overwhelming. If V=0 (i.e., if there is no clear vision), the organization will experience both confusion and anxiety if change is undertaken. If P = 0 (i.e., if there is no established process for making change happen), then frustration will be high and change will ultimately be rejected. When any factor is zero, then the cost of change will obviously higher than the benefits of change. Ferrari recommends changing a little at a time to reduce the resistance to change. He explains:

“Instead [of leapfrogging ahead], why not channel big data challenges into baby step initiatives aimed at a portfolio at information hubs augmented with predictive analytics competencies. Consider pilot programs targeted at specific problems in demand sensing, supply risk, or logistics and distribution orchestration. … Consider that if we are thinking of doing a major renovation of our homes, and we do not understand all that is involved, we often do some homework, seek knowledge from experts, set a reasonable budget and timeline and gain the support of fellow family members. This same analogy can be applied to channeling the frustration of drowning in data into the harvesting of predictive supply chain capabilities. Walk before you run and take steps that bring teams to initial successes along the journey.”

In her first post, Cecere asserted “within five years, I believe that the holistic use of this data will be mainstream.” If she is correct, companies may not have the luxury of taking all of the baby steps that Ferrari would like them to take. Nevertheless, I agree with Ferrari that pilot programs and prototype projects are an excellent way of discovering if the benefits outweigh the costs. If Cecere is right, they undoubtedly will and full implementation will proceed apace. The first baby step towards understand “the Art of the Possible for Big Data Supply Chains” is becoming familiar with the jargon. Cecere offers the following “new terms to know”:

Big Data Supply Chains. Each person that you talk to will define this differently. When it is used in a business concept, ask what the user means. There is no standard definition, but in general, it means a dataset that is too large and awkward to use conventional relational data base techniques for capture, storage, search, visualization and sharing of data. It is the world of terabytes, exabytes and zettabytes of data.

Columnar Store. A type of database management system that stores information by column versus by row. Columnar databases enable in-memory processing, column pruning and compression. They enable outrageous compression factors, it is not uncommon to compress a Terabyte of traditional row-store data into tens of Gigabytes. The advantage is the ability to aggregate similar data to increase computational speed. SAP HANA architecture is an example of advances being made in in-memory processing through advances in columnar store architectures. It has advantages and disadvantages. I believe that SAP HANA will help us with visualization of large data sets, but it is far from a panacea to help redefine supply chain architectures. IBM, too, provides columnar database capability to speed data warehouse queries. The IBM Smart Analytics Optimizer provides this capability with DB2 relational DBMS on z/OS (mainframes), and related technology like the Informix data warehouses (e.g. the Informix Warehouse Accelerator).

Fuzzy Logic. A form of computer reasoning that is approximate versus binary logic that is fixed and exact. It enables decision making that is not ‘black and white’ where the best answer lies in understanding the range between completely true and completely false. While optimization helped drive business intelligence in the 1990s, new forms of pattern matching and the use of fuzzy logic will be combined with artificial intelligence to drive new ways to sense, act and then respond. For an early solution in this area, check out Enterra Solutions.

Hadoop. A framework designed to support data-intensive distributed applications to support thousands of nodes and petabytes of data. It is often referred to as open source Apache Hadoop and is being designed by global community using Java. Yahoo is the largest contributor. It is new and largely unproven for use by product manufacturers. IBM builds on Apache Hadoop with its InfoSphere BigInsights product to provide an analytic infrastructure for massively distributed data.

MapReduce. MapReduce is the framework of Hadoop. Introduced to the market by Google in 2004, this software framework uses map and reduce functions commonly used in functional programming to speed processing through distributed computing on large data sets on clusters of computers. There are few use cases for the supply chain, but Teradata’s acquisition of Aster Data opens up new possibilities to combine MapReduce and SQL to solve big data supply chain problems. It makes the processing of distributed semi-structured data easier.

Pattern Recognition. Pattern recognition uses fuzzy logic to recognize sets of data like others and identify patterns in large data sets.

‘R’ A freeware or open source programming language for statistical computing and graphics. Recently, it has been widely adopted by statisticians for developing statistical software and data analysis. R is not well suited for big data problems unless you like to write tons of code. It has been widely adopted by bio-informatics but has yet to penetrate the larger analytics market. Companies will be constrained by architectural memory limitations of R, but the open source nature of R will enable data-centric processes.

Natural Language Processing. To harness the power of unstructured, electronic text data in machine learning.

Ontology. A rules-based approach for semantic association and category relations. We are seeing the use of rule-based ontologies in the evolution of Sentiment data (SAS), Supply Chain Execution (Enterra Solutions) and Supply Chain Risk Management (Dunn and Bradstreet/Open Ratings).

Semi-structured data. A form of data which contains both structured and unstructured components. It does not conform to formal structural definitions of relational data base tables and data models, but can may contain some defined fields, such as subject line or date, in addition to free format text data, such as the body of an email.

Unstructured data: A data set without pre-set structure. Unstructured data abounds in call-center logs, social listening, contract, servicing and warranty data and risk management applications. Early applications to harness the power of unstructured data for the supply chain is Dunn and Bradstreet’s application Open Ratings and SAS Inc.’s Social Media Analytics application for social media listening.”

Since Cecere mentioned Enterra Solutions a couple of times in her post, you can understand why this topic is of interest to me. Enterra Solutions is deeply involved in trying to make Big Data Supply Chains a reality. We’re taking baby steps using prototypes and pilot programs in order to prove concepts and gain trust. In the end, however, we believe that the kind of user-friendly knowledge new technologies can generate and present to decision makers will prove that Cecere’s vision of the future is on the mark.