Data science

Structured work on data quality

March 8, 2024 - 6 minutes reading time
Article by Natan Van Der Knaap

Good quality data is essential to the functioning of an organization. This is well known. Yet few organizations work on data quality in a structured way. Natan van der Knaap researched this and developed a widely applicable model for data governance.

Van der Knaap recently graduated from Tilburg University and works as a data & analytics consultant at Centric. Why did he choose data governance for his thesis research? "During my master's in Information Management, I learned that data governance is mainly about the human aspect of improving data. You can buy tools to improve data quality, but ultimately people have to use those tools. I find that focus on the human aspect interesting." He delved into data governance and discovered that there was not yet a framework for master data governance. "I doubted what I could contribute to this research field as a student. When I found out that no framework existed yet, I thought it was a great opportunity to add something! That way I could combine both the human and data aspects."

What is master data?

In his graduate research, Van der Knaap developed a framework for master data governance. "Master data are important data that are widely used in the organization. For local government, for example, it is the name and address details of residents." It is important that every department, process and system uses identical master data, such as the same resident address. Data not covered by master data are transactional data, for example, such as process lead times or receipts. In short, then, master data provides a shared reference point for an organization's core objects. For example, suppliers, products or employees.

Key insights

In his research, Van der Knaap looked at best practices in data governance and master data. Combining these, he arrived at a number of hypotheses of what is the most appropriate approach for master data governance. In interviews with seven experts from different organizations, including four from the public sector, he tested these hypotheses. It provided him with a number of insights. "I thought that many organizations were already quite advanced with data governance, given all the literature on the subject. But they're not. That also means that the main focus now is on data quality in a broad sense, not just master data." Van der Knaap broadened the governance framework he developed with the experts so that it can be used not only for master data, but for promoting data quality in general. Another striking insight gained by Van der Knaap is that organizations are making clear choices when it comes to data quality. "Value creation comes first, the main focus is on how to get as much value as possible out of data. Investments in data quality are made from that point of view. Obvious perhaps, but not so in science. There the focus is on data quality in general and no distinction is made between data."

'The main focus is on how to get as much value out of data as possible'

‐ Natan van der Knaap

Starting small

Data governance is seen as something big and complicated, and that's true, says Van der Knaap based on conversations with other data experts. "It has many different aspects. That's why it's wise to start small. You often know best who within an organization is working with data and who makes the choices about it. The responsibilities are often already there, but they are not formalized. If you start with that, you get more control over the data." Implementing these responsibilities contributes to better regulation of the quality of your (master) data. Additionally, it is crucial to gain insight into which data is used in various areas within your organization. "By understanding the value of the data and where it is used, you can collaborate with users to make and execute plans to improve quality. In general, experiencing the benefits of data in your own daily work accelerates the improvement of data quality," Van der Knaap adds.

'When you experience the benefits of data in your own daily work, improving data quality gains momentum'

‐ Natan van der Knaap

Working on data quality in a structured way

With the framework developed by Van der Knaap, organizations can work in a structured way to improve and secure data quality. It consists of roles and measures taken top-down and bottom-up. From the top, for example, strategy and standards are determined and resources are made available. One role includes the executive sponsor. Bottom-up, how the standards are fleshed out and what constitutes good data quality are determined. Roles involved include data user and data steward. "If you set up data governance this way, you involve the whole organization and it becomes something that is supported by the whole organization." Exactly how you set up the framework depends on the organization and the context in which it operates.

Expanding services

Van der Knaap's findings fit with the services Centric wants to further develop in the field of data management. "We already have various tools to map the quality of data, for example. We are expanding these services, working with customers to improve their data quality and making employees more aware of its importance. Examples include training courses, guidance on improvement projects and knowledge sessions." In addition to supporting customers in improving application data quality, Centric also wants to help organizations on a more strategic level with data governance issues. The framework developed by Van der Knaap fits that bill. It is also continuing to work on technology that automatically checks the quality of data and adjusts it where necessary. "There are so many possibilities when it comes to improving data quality and setting up good data governance. We are only at the beginning of it," Van der Knaap concludes.

Schematic representation of the data governance framework

Antecedents: the design of data governance is different for each organization. It depends on external and internal factors that are different for each organization, such as culture and regulations.

Analytical governance: this is a mechanism for examining what data is used for and what value it delivers to the organization.

Decision strategy: here organizations have a choice between a centralized, decentralized or federated model. In a federated model, both the top and execution of the organization are best involved in improving data quality.

Roles and responsibilities: there are many different roles and responsibilities that an organization can implement. It is important to involve every individual who has a role in ensuring data quality, both at the top and in execution. In a large organization, a data governance office provides the bridge between top and execution.

Data ownership: you can interpret this in different ways. It is important that it is clear to whom you can go if there are problems with data quality. In best practices, a process owner appears to be the appropriate person to also be data owner.

In the center of the circle are the corporate values to which data use must ultimately contribute. For example, transparency or efficiency. Ultimately, all governance initiatives are designed to ensure the quality of (master) data, thereby realizing more value.

Related articles
Data-oriented working in nine steps
Data science Retail Finance Public Logistic
The rise of data science offers an enormous amount of opportunities to map and solve problems within your ...
Data science: get over your threshold fears
Data science Retail Finance Public Logistic
It is sometimes referred to as one of the pillars of the 'Fifth Industrial Revolution': data science. Wit ...
Data science should increase efficiency above all
Data science
That data science is a promising field of research is beyond dispute. But what exactly is its purpose? Th ...