Data science

How do you optimize data quality for genuinely effective data-driven work?

May 27, 2024 - 4 minutes reading time
Article by Natan Van Der Knaap

In the first article of this series, we established that data-driven work requires high data quality. In this second and final article, we will tell you how to improve data quality from an organizational and technical perspective.

As you have read in the first article, we judge data quality by the extent to which data meets the expectations of stakeholders, or users. However, many organizations fail in defining what makes data fit for purpose (DAMA International, 2017). This data user side is essential to making informed choices and conscious financial investments in data management. Indeed, decisions and investments to improve data quality must contribute to the user side's goal of creating more value. Organizing conversations between user, manager and data owner contributes to the awareness of these parties and the reasonability of certain quality requirements. Balancing everyone's interests in combination with laws and regulations is what data quality governance embraces.

By the way, the solution does not always lie in choices or investments in more quality rules or more systems. Often there is an underlying cause why the current data quality is not in order. It is wise to investigate this first, although this is not always possible. The choice between short-term action or investigating the underlying cause first must be made by the owner and end user, and is different for each use case.

We ended the first article with a quality check you can perform on your critical data. By comparing the results of this check with the intended quality, you can identify any deviations. In addition to the technical script, deviations can also be identified by getting in touch with different employees in the organization and asking what they are up against when using certain data. In this way, you can arrive at points of improvement that will help you achieve the intended data quality.

ARTICLE

High data quality is prerequisite for data-driven work

Read the article.

Improving data quality

Once improvements are identified, the key is to prioritize and implement them based on analysis and discussion with stakeholders among users, administrators, and owners. This allows you to build consensus internally on prioritizing data quality improvements. Once it is clear where the organization ultimately wants to grow to, it is important that you start with concrete use cases. It is crucial to have a goal, but also that you achieve this goal incrementally and with intermediate results. This way, the organization will quickly and continuously see the added value of your improvement projects.

People experience more efficient use of data and better support for organizational goals. Repeating the above steps creates a data quality program. In short, a data quality program involves managing data throughout their lifecycle by setting standards, building quality into the processes that create, transform and store data, and measuring data against the established standards (DAMA International, 2017). Thus, data quality becomes part of mainstream data governance, about which more later.

Improving data quality is done through organizational and technical actions. These are components that are also usually described in a data quality program.

‘It is important that the organization is aware of the need of proper data quality'

Organization

It is important that the organization is aware of the need for good data quality. In the end, in fact, data quality will not be ensured by a collection of tools and concepts, but by a mindset that helps employees act while always thinking about data quality. Data quality is not just the responsibility of a data quality team or IT group. Every employee who "touches" data can affect the quality of that data.

Raising an organization's awareness of the importance of data quality often requires a significant cultural change. Cultivating quality awareness among all employees who handle critical data is paramount. Finally, the long-term success of improving depends on whether an organization is willing to change its culture and adopt a quality-oriented mindset (DAMA International, 2017).

Education

Education is an important tool for increasing awareness. By using education to increase employees' understanding of data, organizations can use data more effectively, communicate insights to others and formulate more relevant questions. This increased proficiency, also known as data literacy, contributes to a more informed demand for and supply of data, allowing for better prioritization of data management efforts. In addition, it creates a positive attitude toward and responsible handling of data.

'More and more organizations are discovering data governance to regulate the quality and cost of their data'

Data governance

In addition to awareness and literacy, organizations use data management and data governance to further ensure data quality. Whereas data management is primarily about execution, data governance takes care of governance issues and establishing guidelines, standards, roles and responsibilities. Many organizations use data governance to regulate the quality of their data, ensuring that it meets legal and ethical standards and thus enables reliable decision-making (Charles et al., 2022). Especially within the public sector, this improved decision-making capacity leads to the formulation of better policies and promotion of public values, such as security, accountability and transparency (Matheus et al., 2020). Some key roles to ensure data quality are the data user, data owner and data steward.

Importance of collaboration between departments

A key issue is whether it is better to implement a data quality program from the top down or the bottom up. In general, a hybrid approach works best: from the top down for sponsorship, consistency, and resources, from the bottom up to discover what is actually broken and to achieve incremental successes (DAMA International, 2017). You also want to use domain knowledge, because data managers and users within those domains ultimately know best when data is of good quality. They can best determine for themselves what requirements are needed for data and what necessary training and guidelines go along with it. Finally, you want to ensure data quality across the entire lifecycle, beyond the boundaries of systems, processes, or departments. This requires good collaboration. Sharing knowledge and promoting best practices in data management creates a culture of transparency and collaboration, which ultimately leads to better decision-making and operational efficiency. This collaboration between different departments is therefore crucial to ensuring data quality.

Technological aspects

Technological aspects can also be used to improve data. Consider automated continuous quality checks on your critical data by applying quality dimensions, which are automatically reported periodically. These can be technical data quality rules, such as about entering data in the correct format. It can also go a step further, such as a child who is not entitled to welfare benefits. The rules are a form of metadata, implemented in databases or applications that create, transform or consume data.

Quality of metadata

Metadata, simply put data about data, plays a crucial role in managing data quality. It defines what the data represents and helps in understanding it. In doing so, metadata supports improving data quality by, among other things, establishing those data quality rules that describe how data must exist to be useful and usable within an organization. So to complicate matters further, it is important that the quality of your metadata (which, among other things, describes quality) is also of good quality.

Collaboration

Collaboration between different departments is also an important factor. In fact, different departments within an organization often generate and use various types of data. By working together, departments can consolidate and integrate data from different sources, resulting in a more consistent dataset. This collaboration also facilitates data validation for accuracy, consistency, and completeness, and promotes data standardization, which improves interoperability and interchangeability.

The role of source applications in ensuring data quality within organizations is vital. This is because quality is best improved and ensured at the source rather than later in the process. Applications contribute to improving data quality in several ways:

  • Data entry: business applications can implement automated validation checks during the entry process.
  • Data integrity: applications can implement security measures and access controls to maintain integrity.
  • Process automation: by automating business processes, it reduces the likelihood of human error and increases data consistency.
  • Data integration: by integrating and consolidating data from different sources, business applications create a single, consistent view of data across the organization (e.g. warehouses).

Finally, dashboards allow you to constantly monitor data quality. The more data-driven decisions you start making, the greater the importance of clean, reliable data becomes. My tip is therefore to develop dashboards for each domain, which allow the process owners to monitor and analyze the quality of the data. Furthermore, you can use quality rules to identify outliers and (semi)automatically correct them. Think of a signal when the quality drops below a certain level. Subsequently, you can investigate the root cause of the lower quality.

Ensuring data quality

Improvements already often include ensuring data quality. For example, education helps raise awareness in the organization and the implementation of quality rules in applications when (semi)automatically dictating and correcting quality issues. In combination with the data quality program, the various improvement initiatives are given direction and roles and responsibilities are defined to whom you can escalate issues. In addition, by using data quality dimensions for critical data, you can measure the new quality levels after each improvement and prioritize further improvements.

Continuous process

In this article, I deliberately mention program instead of project several times. I do this because data quality is a continuous process and is closely tied to the Deming PDCA cycle. The long-term success of a data quality program depends on an organization's willingness to change its culture and adopt a quality-oriented mindset. Discovering and refining business rules for data quality is also an ongoing process.

More information

Importance of accurate and reliable data

The importance of accurate and reliable data as a foundation for the success of organizations is obvious. That's why Centric offers several customized services to transform your data into a reliable and valuable resource.

Related articles
High data quality is prerequisite for data-driven work
Data science Retail Finance Public Logistic
In this article, you will read why high data quality is important if you want to work data-driven.
Sustainability from a data mindset
Data science Retail Logistic
In this article, read why sustainability benefits from a data mindset.
Structured work on data quality
Data science
Few organizations still work on data quality in a structured way. Natan van der Knaap, graduate student a ...