Information Quality

Information quality is a relatively new concept to many organisations. With the increase in data collection and storage, and the mining of that data for business uses, the quality of the information produced becomes increasingly important.  Bad information can lead an organisation to squander resources on ineffective projects, but quality information can identify needs, direct targeted services and create efficiencies in every day work. Clearly, this is an area worth close attention. 

Data Vs. Information

Information is data in a context.  For example, data could be "01 4548727".  This is just a string of numeric characters until you label it "phone number", giving it a context and meaning.  There is a value-chain model applicable to information, known as the DIKAR model.  

Data => Information => Knowledge => Actions => Results

This means that, for example, data collected at an outreach event is taken back to an organisation and stored.  Later, that data is retrieved, and on the basis of which a course of action is chosen which results in a given end result.  

Quality information is defined as "information that is suitable for all of an organisation's purposes, not just my purposes."

Costs of Poor Information

In the private sector, large organisations have found that the costs of storing, mining and using poor information can be staggering.  According to Tom Redman, in "Data Quality: The Field Guide",  those costs "are roughly 10 percent of revenue for a typical organization.  CEOs know that the 10 percent figure includes only the costs that are easy to measure. This figure does not include other costs, such as bad decisions and low morale that are harder to measure but even more important."

Quality Information

Quality information is defined as "information that is suitable for all of an organisation's purposes, not just my purposes."  This means that the data is suitable for use in a mail merge, an email, statistic analysis, etc.  Obviously, any one given use for a set of data will require slightly different things; this requires an organisation to analyse its needs and the uses to which the data will be put before attempting to create any manner of data quality policy.  

Quality must be built into the processes of an organisation if it is to be successful and sustainable in its information management.

Often data comes from many disparate, often theoretically discrete sources, e.g. telephone contacts, company websites, government lists, face to face meetings, and so forth.  Often the data is collected for one purpose, for instance to track who came to an event, and used for a completely different purpose, say, to send email newsletters. The challenge, here, is to make sure that the data collected is at least the minimum required to perform any of the tasks which the organisation has identified as requiring that information.  

Quality information begins with the collectors of data.  As data is a resource that is reused many times, costs arising from any flaws or inadequacies in the data are compounded and multiplied through this reuse.  If the data collected is bad, then the errors within will perpetuate and grow through out the DIKAR value chain. Quality must be built into the processes of an organisation if it is to be successful and sustainable in its information management.

In this information age, organisations simply cannot afford to be without the quality information they need to effectively connect with their customers, whilst negotiating the evermore tangled web of regulations within which they must operate. Ballou, Madnick and Wang asserted that users who collect the data produce better quality data than might first be surmised, once they are armed with the reasons why the data production process is so important. However, the data collectors must be educated on the importance of quality, with real examples of the results. Simply telling a data collector that it is important is unlikely to be effective.

Quite often an organisation will want to fix their existing data. This has limited use, because unless the processes for procuring and maintaining the data are addressed, 'cleaned' data will soon be swamped under a relentless tide of new dirty data.  Quality must be designed into the fabric of an organisation.

 

Measure and Manage

It's been said that 'if you can't measure it, you can't manage it.'  This holds true with information as with people.  If attempting to improve the quality of existing data, you must first measure that existing quality.  This requires the identification of measurable criteria, for example:

  • Completeness of records: are all required fields filled in?

    1. E.g., if a contact name is filled in, make sure that the job title field and the salutation field are also entered
    2. A name and a phone number may well not constitute a complete record.
    3. Completeness allows for use of the data for more purposes.
  • Consistency: is the data entered in a uniform manner?
    1. A given data field should have only one type of data in it, and only that type of data, e.g. telephone numbers.  
    2. It should be formatted in a consistent manner, e.g. "(##) #######"
    3. Consistency allows for less complicated retrieval of information from a database.
  • Timeliness: is the data out of date?
    1. Data protection law requires that data be discarded if older than and unverified in two years.
  • Accuracy

Fixing Data

If a set of data is measured against identified criteria, then it is possible to establish a course of remedial actions to increase the overall quality.  

For example, one remedial action might be to ensure that

  1. all records contain an entry in the "county" field, even if it would not be required by the postman, e.g. Cork, Co Cork.
  2. all county entries follow a set format, e.g. Co Dublin, not Co. Dublin, County Dublin nor Dubh Linn.


Data quality can also be estimated by taking a sample and determining how many records have values that differ from the true values. If, after taking corrective action, some data is still found to be substandard, according to the organisation's newly developed policy, the bad data should be discarded.

The Outlook

It is essential for an organisation to integrate quality processes into the fabric of the organisation, teaching the data collectors why data quality is important, making sure the correct collection rules and principles are adhered to and collecting it, processing it and producing it in a considered and well defined manner. Quality comes not from audits, although regular measurement can help maintain and improve it, but from within the organisation itself, from its people and their daily endeavours.

References:

Data Quality: The Field Guide written by Tom Redman and published by Digital Press
Donald Ballou, Stuart Madnick, And Richard Wang, Assuring Information Quality, Journal of Management Information Systems