From an HR perspective: The HR department manager, responsible for the employee availability data product, has full knowledge of how the data is collected and who has rights to view it. When a request came in from an external customer to use this data product in a prediction model, it was decided after consultation that the original dataset could not be shared. However, an anonymized version could be provided as a basis for access to the final insights. The customer had specific quality requirements for the dataset. To meet these requirements, the HR department wanted to use Six Sigma to gain structured insight into the quality of their operational data processes and demonstrate that they can meet the set requirements. This enables them to deliver consistent and reliable data products.
The first step within Six Sigma is establishing the specific quality requirements and needs of the customer, also known as the Voice of the Customer (VoC). This input is then translated into measurable process requirements, or Critical to Quality (CTQ) characteristics. The CTQ characteristics form the measurable performance standards within the operational process that must be achieved to meet customer expectations (VoC). This way, a data product must be delivered that is optimally aligned with specific use cases, as discussed in article 1. By consistently monitoring and optimizing these CTQ characteristics, variation in the process can be reduced, which increases the reliability and quality of the data product.
Example VoC: "I need a weekly dataset with (anonymized) employee data, including their availability for the next two weeks and the number of hours they actually worked."
Translation into CTQ characteristics:
- Timeliness: A new dataset version must be available weekly on Friday.
- Completeness: 100% of the (anonymized) employees and their planned hours for the next two weeks must be in the dataset.
- Accuracy: At least 90% of the 'actually worked' column must be correct.
- Consistency: For the same employee, availability and hours worked must correspond.
- Validity: The number of hours worked must be displayed in whole numbers.
You can see the theory of DAMA DMBoK (Guidelines for Effective Data Management) and Six Sigma coming together here. The CTQ characteristics are measurable properties that you can easily link to the quality dimensions within data management. Think of dimensions like timeliness, completeness, accuracy, consistency, and validity, extensively described in DAMA DMBoK and summarized in my previous article about data quality. By actively monitoring CTQ characteristics, the HR department can not only meet customer expectations but also establish internal quality standards. This makes it possible to deliver data products that are more consistent, relevant, and valuable. This way, Six Sigma can help the organization identify and address potential defects and variations internally, before the product goes to the customer.
Monitoring customer requirements of the data product
Monitoring CTQ characteristics can be done using Statistical Process Control (SPC). SPC is a method to manage processes by measuring variations in input or output, or analyzing process steps (DAMA International, 2017). SPC is based on the assumption that if a process with consistent input is executed consistently, it will produce consistent results. If the input or execution changes, the results will change too.
For each defined CTQ characteristic, the next step is to determine a lower limit, upper limit, or both, in consultation with the customer or by the product owner themselves. By taking measurements on the CTQ characteristics, you gain insight into how well the product meets customer specifications. Automating these measurements can increase efficiency and consistency. For example, by automatically reading logs or using scripts to monitor properties such as completeness and accuracy. This saves time and helps detect deviations more quickly.
However, it is important to realize that measuring all CTQ characteristics can be complex and time-consuming. Therefore, it is wise to focus on the most critical characteristics, rather than trying to measure everything. This requires a risk assessment by the product owner: which data and process steps must always be of high quality and intensively monitored, and which are less critical for the customer if an error occurs?
Not all process steps have an equal impact on the final quality of the data product. Some steps, such as generating and processing data, have a greater influence on the end results and must be more strictly controlled. Other steps can be followed with less intensive SPC measures. Compare it to manufacturing a car: you want to test the engine very thoroughly, while the hat shelf is less important. A layered approach to monitoring helps to increase process efficiency without compromising the end quality of the data product.
Example of SPC in the data process
When the outcome of a measurement falls outside the defined specifications, we call it a defect. In that case, the product is not approved and must be adjusted or rejected. Variation in the process leads to variation in the data product and can result in poor end results. With SPC, you can see at a glance when the quality of a process step in your data product doesn't meet customer requirements. After all, the data product is the end result of the process. The number of defects in the process steps, such as generating or processing data, is an indicator of the process's stability and quality.
Below is an elaboration of SPC for CTQ 1 and CTQ 2 of the employee availability dataset. This elaboration focuses on measuring timeliness by looking at timestamps and completeness by measuring the number of null values.