Data science

Lean Six Sigma and Data-as-a-Product: a winning combination (part 3 of 4)

November 28, 2024 - 3 minutes reading time
Article by Natan Van Der Knaap

In the first part of this series, you learned how data can be approached as an independent product, optimizing datasets for specific use cases—an essential condition for creating value through data-driven work. Part two introduced the Six Sigma methodology to make the data process transparent and ensure that each process step meets customer requirements, resulting in high-quality, reliable data products that customers can consistently use. In this third installment, we take it a step further by introducing the Lean methodology to enhance efficiency by reducing waste. This aligns with the vision of DAMA International, which states that all data management disciplines should contribute to high data quality in support of the organization. Each article represents a step forward in enabling customers to fully benefit from data products.

What is lean?

Lean is a methodology particularly suited to the development and production of data products. From the customer’s perspective, Lean considers any activity that does not directly contribute to what the customer truly needs as waste (Tempelman & Schildmeijer, 2023). Lean aims to identify and eliminate these wastes, leaving only value-adding steps. This approach leads to data products that better align with customer expectations, thereby achieving higher quality. Lean defines eight types of waste within a process, which can be interpreted in the data world as follows:

  1. Transportation: unnecessary movement of (master) data between systems, processes, or departments, consuming time and resources without adding value.
  2. Inventory: storing data that is not needed, such as outdated or redundant information, which offers no added value to current processes or analyses.
  3. Motion: inefficient actions and processes by people in obtaining and processing data, reducing productivity.
  4. Waiting: time lost due to users, processes, or systems waiting for data to become available or processed.
  5. Overproduction: collecting and producing more data than necessary, leading to excess storage and complexity without added customer value.
  6. Process complexity: unnecessary steps, excessive quality controls, and a lack of strategy or standards, resulting in inefficient and hard-to-manage data processes.
  7. Defects: incorrect data input, processing, or insufficient validation, leading to unreliable analyses and inefficient processes.
  8. Unused talent: underutilizing employees by neglecting their technical skills or business knowledge.

By identifying these types of waste and analyzing the operational process through the data management framework described in DAMA DMBoK, you can eliminate waste and create more customer value. Some wastes are easier to address than others; start with those that have a significant impact on value creation and are relatively simple to resolve.

The importance of a robust development process for a new data product

Understanding business processes and required decisions

A common misconception in data-driven work is that data dictates direction, whereas it is people (the business) who drive decisions. Data supports decisions and processes led by the business. Every dataset represents only part of the reality and requires interpretation within a broader context. Therefore, it is essential to first have a clear understanding of business processes and the decisions needed before diving into the data. As discussed in the first article, organizations create value through their processes, and data as a product can help make these processes more effective and efficient.

From push to pull

A well-known challenge is the lack of alignment between business and IT, which hampers the effectiveness of data products. The BI team plays a crucial role as a bridge between business and IT: while IT focuses on technical solutions (like API integrations and performance), BI translates business questions into data models and analyses. This requires a shift from traditional push thinking (produce as many data products as possible) to pull thinking, where the business clearly specifies the insights it genuinely needs.

Since these insights and specifications are not always fully known at the start, a flexible, iterative process is needed where the business, BI specialists, and IT collaborate on the data product. By remaining adaptable and regularly aligning goals, waste is minimized, and the product continues to meet real needs. Depending on the organization, the BI team may be organizationally under IT (a push situation) or positioned closer to the business (a pull situation). In either case, strong collaboration between all parties is essential to maximize value from data.

'A good development process begins with a thorough exploration of customer demand'

Goal orientation

A robust development process begins with a thorough exploration of the customer’s needs. By clearly defining what the customer requires, an operational process can be designed to deliver value without waste. This is where goal orientation comes into play: every step in the data process must contribute to the customer’s objectives without unnecessary actions, such as collecting or analyzing data that provides no direct value. The clearer the goals are from the start, the more purposefully the process can be structured.

An effective development process focuses on creating customer value. Understand the purpose of the data product, the insights the customer seeks, and the requirements regarding speed, security, privacy, and data quality. This results in a flexible process that continuously enables the customer to derive more value from data-driven work.

*DAMA International is a non-profit, vendor-independent, global association of technical and business professionals dedicated to advancing the concepts and practices of information and data management.

**DAMA-DMBOK is a framework described in a book for information and data management, divided into eleven knowledge areas. This framework helps organize activities related to information and data governance. DMBOK stands for Data Management Book of Knowledge.

In the fourth and final part of this series, we will apply the eight Lean wastes to the operational process of data products. Anything that does not contribute to customer value (as defined in the development process) will be considered waste.

Related articles
Lean Six Sigma and Data-as-a-Product: a winning combination (part 2 of 4)
Artificial intelligence Data science Retail Finance Public Logistic
The Six Sigma methodology can help you monitor and optimize the operational process steps of a data produ ...
Lean Six Sigma and Data-as-a-Product: A Winning Combination (Part 1 of 4)
Digital transformation Data science Retail Finance Public Logistic
Data and process optimization are closely linked. In this article you will read how Lean Six Sigma can be ...