Lean Six Sigma and Data-as-a-Product: A Winning Combination (Part 4 of 4)

Identifying waste within the operational process of a data product

Metadata plays a crucial role in the operational process. Well-defined metadata reduces waste by making data easier to locate and interpret. A key aspect is data lineage, which maps the origin, transformations, and movement of data. Tools like Process Mining visualize all processing steps, providing insights into potential waste and reducing process complexity. Below, we identify key sources of waste in three operational steps: generating, processing, and delivering data.

1. Generating data

Efficient data collection is essential to avoid missing valuable data, particularly in the public sector, where data collection is often manual and secondary to citizen services. Focus on collecting qualitative critical data: data that is essential for decision-making, legislation, or specific data product goals. This prevents unnecessary effort on irrelevant data. Where possible, automate manual steps to reduce errors and increase efficiency. This ensures the focus remains on valuable data, avoiding unnecessary storage and processing.

Alongside data collection, determining a storage strategy is crucial. Only store data that contributes to business goals and remove outdated data promptly to comply with regulations. Although storage costs have decreased, excessive data copies still pose risks and add complexity. Poorly coordinated storage and shadow administrations increase the likelihood of errors, such as inconsistencies, complicating management. This applies to both structured and unstructured data, such as forms scanned as JPEGs, stored as PDFs, and later interpreted with OCR for digital readability. These processes increase complexity, processing costs, and error risks.

2. Processing data

The Extract, Transform, Load (ETL) process is central to data processing. During extraction, waste often occurs in complex environments with master data. Examples include unnecessary data transfers or multiple "versions of the truth." Long data access times, caused by silos or limited access, are another bottleneck. System integration or centralization in a data warehouse can improve efficiency and speed, enabling departments to collaborate and share data more effectively, provided this is done securely and controlled.

After extraction, data is adapted for specific applications or analyses. Understanding customer needs is essential to avoid unnecessary transformations. Uniform standards and consistent processes, as emphasized by Lean principles, are vital for efficient processing. Consistent standards and robust modeling reduce variability and support data integration across sources. Additionally, consider the necessity of personal data when designing a data product. This minimizes unnecessary security work and reduces data breach risks.

Skill waste is another common issue. BI analysts are often tasked with data preparation, spending up to 60% of their time on tasks better suited for data engineers. This misalignment limits their ability to apply mathematical and statistical expertise, leading to inefficiencies.

Finally, waste occurs in applying analyses, whether descriptive, diagnostic, predictive, or prescriptive. Goal alignment is critical: determine which insights the customer needs to avoid overly complex, low-value analyses. Sometimes a simple dashboard aligns better with business needs than a complex predictive model. While advanced analyses can provide substantiated advice, they require a solid foundation. Start with descriptive and diagnostic analyses to understand the situation and needs thoroughly before progressing to advanced techniques like AI for predictions.

3. Delivering data

Data products are ultimately delivered to customers, ideally through a catalog with metadata. Depending on use cases, this can provide ready-to-use insights or flexibility for customer reuse. If customers experience delays in accessing data, inefficient ETL processes might be to blame. Switching from batch processing to continuous data flows can significantly reduce lead times, aligning with Lean's flow principles. However, batch processing often suffices in the public sector, as seen with the passport demand predictor discussed in the previous installment.

Customer evaluation is a critical final step: are the insights clear, and is the user sufficiently data-literate to utilize the product effectively? Where necessary, additional education can help maximize data value. Besides qualitative feedback, quantitative measurements, such as usage frequency and user demographics, can assess alignment with expectations. Use the PDCA cycle (Plan-Do-Check-Act) to test the product's value and prevent redundancy by exploring its applicability across departments.

Structurally maintaining the data product

With the data process clarified and processes monitored (Part 2) and waste eliminated, the next step is establishing ownership. This assigns responsibility for realizing and optimizing data products aligned with customer needs. Customers will know who to contact for access or quality issues. The product owner can eliminate waste and decide between custom or standard products to meet customer expectations.

Formalizing data governance and roles, such as owner or steward, ensures clarity around access and quality management, saving time and increasing efficiency. A federated governance structure suits this approach, combining centralized strategy, architecture, and standards with specific needs of individual data products. This breaks down silos and fosters collaboration between business and IT.

Additionally, this supports a culture of Jidoka (a Lean principle from the Toyota Production System), where problems are addressed immediately, and errors are tackled early by the business itself. The previous installments provide guidance: Part 2 focused on identifying waste, and the later parts explained how to eliminate it. This ensures continuous improvement of data products that deliver customer value.

Lean optimizes data processes

Eliminating unnecessary data storage and traffic while focusing on goal-oriented data makes data products more accessible (addressable), consistent (reliable), easier to reuse (self-descriptive), better integrated with other systems (interoperable), and safer. Proper metadata enhances findability and comprehensibility, while reduced process complexity, fewer errors, and optimal talent utilization improve data product quality.

Lean principles contribute to the core attributes of a good data product, as defined by Zhamak Dehghani (see Part 1). Viewing data processes and products from a customer perspective is essential. Any element not contributing to customer value constitutes waste and adds variability to process outcomes. Applying Lean principles and data governance optimizes data processes, resulting in consistently high-quality data products that drive customer value creation.

Do you have questions about treating data as products, monitoring data processes with Six Sigma, or improving them with Lean? Contact us for a free consultation. We look forward to helping you create maximum value with optimal data products!

Themes

Industries