• United States

DQM technologies cover human error

Apr 25, 20062 mins
Data Center

* Types of data-quality management technologies

Companies that take data-quality management seriously should evaluate the key technologies that can help automate processes and eliminate human error.

Leading-edge companies use all the technologies described below for a complete DQM deployment. These technologies should be used according to a clear schedule, and throughout the information lifecycle. Most are available from large all-in-one suites and managed-services vendors as well as in smaller, niche products.

They include:

Data Profiling, also called discovery, is designed to provide a “state of the data” analysis of all the information in an enterprise. The goal is to understand the content and get a thorough assessment of its accuracy and completeness. The process should include metadata validation (making sure that the data definitions are up to date and serve the business, and that the entered data matches the metadata descriptors); pattern and statistical analysis (including frequencies and ranges); identifying redundant data and drawing relationships among them; and validating business rules and policies across entries.

Data Standardization, which covers a wide range of quality initiatives, including: uniform spellings, abbreviations, and data categories; breaking down multivalue fields, so that each element is distinct and can be manipulated and reported on; translating data codes into values that are understandable by nonexperts; linking relationships, so that multiple instances of the same entry are recognized as such; and “householding,” which attaches multiple entries to a single group to avoid duplicate efforts or reporting on similar data (say, a husband and wife listed separately in a database).

Data Enrichment, to ensure that data is as complete as possible, for as long as it is needed. This includes name and address verification, as well as adding variables (such as geographic and census data) to the information the enterprise itself may not have, drawn from third-party sources.

Data Integration, which should apply data quality processes to data in movement (i.e., to data as it moves from one source to another) and ensure all data across applications conforms to uniform rules.

Data Monitoring, to keep companies up to date on the quality of information in the enterprise. DQM is an ongoing process, and managers should be automatically alerted when data quality has fallen below acceptable limits, in aggregate, or when specific pieces of mission-critical data have been dirtied. A good data-monitoring tool should also let managers identify data-quality trends, so they know where problems lie, and what to fix.