skip to main content
research-article

Methodologies for data quality assessment and improvement

Published:30 July 2009Publication History
Skip Abstract Section

Abstract

The literature provides a wide range of techniques to assess and improve the quality of data. Due to the diversity and complexity of these techniques, research has recently focused on defining methodologies that help the selection, customization, and application of data quality assessment and improvement techniques. The goal of this article is to provide a systematic and comparative description of such methodologies. Methodologies are compared along several dimensions, including the methodological phases and steps, the strategies and techniques, the data quality dimensions, the types of data, and, finally, the types of information systems addressed by each methodology. The article concludes with a summary description of each methodology.

References

  1. Abiteboul, S., Buneman, P., and Suciu, D. 2000. Data on the Web: From Relations to Semistructured Data and XML. Morgan Kaufmann Publishers. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Aiken, P. 1996. Data Reverse Engineering. McGraw Hill.Google ScholarGoogle Scholar
  3. Arenas, M., Bertossi, L., and Chomicki, J. 1999. Consistent query answers in inconsistent databases. In Proceedings of the 18th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS). ACM, New York, 68--79. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Atzeni, P. and Antonellis, V. D. 1993. Relational Database Theory. Benjamin/Cummings. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Atzeni, P., Merialdo, P., and Sindoni, G. 2001. Web site evaluation: Methodology and case study. In Proceedings of International Workshop on data Semantics in Web Information Systems (DASWIS). Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Ballou, D. and Pazer, H. 1985. Modeling data and process quality in multi-input, multi-output information systems. Manag. Sci. 31, 2.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Ballou, D., Wang, R., Pazer, H., and Tayi, G. 1998. Modeling information manufacturing systems to determine information product quality. Manage. Sci. 44, 4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Basile, A., Batini, C., Grega, S., Mastrella, M., and Maurino, A. 2007. Orme: A new methodology for information quality and basel II operational risk. In Proceeedings of the 12th International Conference of Information Quality, Industrial Track.Google ScholarGoogle Scholar
  9. Basili, V., Caldiera, C., Rombach, H. 1994. Goal question metric paradigm.Google ScholarGoogle Scholar
  10. Baskarada, S., Koronios, A., and Gao, J. 2006. Towards a capability maturity model for information quality management: a tdqm approach. In Proceedings of the 11th International Conference on Information Quality.Google ScholarGoogle Scholar
  11. Batini, C., Cabitza, F., Cappiello, C., and Francalanci, C. 2008. A comprehensive data quality methodology for Web and structured data. Int. J. Innov. Comput. Appl. 1, 3, 205--218. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Batini, C. and Scannapieco, M. 2006. Data Quality: Concepts, Methodologies and Techniques. Springer Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Bertolazzi, P., Santis, L. D., and Scannapieco, M. 2003. Automatic record matching in cooperative information systems. In Proceedings of the ICDT International Workshop on Data Quality in Cooperative Information Systems (DQCIS).Google ScholarGoogle Scholar
  14. Bettschen, P. 2005. Master data management (MDM) enables IQ at Tetra Pak. In Proceedings of the 10th International Conference on Information Quality.Google ScholarGoogle Scholar
  15. Bilke, A., Bleiholder, J., Böhm, C., Draba, K., Naumann, F., and Weis, M. September 2005. Automatic data fusion with HumMer. In Proceedings of the VLDB Demonstration Program. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Bovee, M., Srivastava, R., and Mak, B. September 2001. A conceptual framework and belief-function approach to assessing overall information quality. In Proceedings of the 6th International Conference on Information Quality.Google ScholarGoogle Scholar
  17. Buneman, P. 1997. Semi-structured data. In Proceedings of the 16th ACM Symposium on Principles of Database Systems (PODS). Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Calì, A., Calvanese, D., De Giacomo, G., and Lenzerini, M. 2004. Data integration under integrity constraints. Inform. Syst. 29, 2, 147--163. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Calvanese, D., De Giacomo, D., and Lenzerini, M. 1999. Modeling and querying semi-structured data. Network. Inform. Syst. J. 2, 2, 253--273.Google ScholarGoogle Scholar
  20. Cappiello, C., Francalanci, C., and Pernici, B. 2003. Preserving Web sites: A data quality approach. In Proceedings of the 7th International Conference on Information Quality (ICIQ).Google ScholarGoogle Scholar
  21. Cappiello, C., Francalanci, C., Pernici, B., Plebani, P., and Scannapieco, M. 2003b. Data quality assurance in cooperative information systems: a multi-dimension certificate. In Proceedings of the ICDT International Workshop on Data Quality in Cooperative Information Systems (DQCIS).Google ScholarGoogle Scholar
  22. Catarci, T., and Scannapieco, M. 2002. Data quality under the computer science perspective. Archivi Computer 2.Google ScholarGoogle Scholar
  23. Chapman, A., Richards, H., and Hawken, S. 2006. Data and information quality at the Canadian institute for health information. In Proceedings of the 11th International Conference on Information Quality.Google ScholarGoogle Scholar
  24. Chengalur-Smith, I. N., Ballou, D. P., and Pazer, H. L. 1999. The impact of data quality information on decision making: An exploratory analysis. IEEE Trans. Knowl. Data Eng. 11, 6, 853--864. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Corey, D., Cobler, L., Haynes, K., and Walker, R. 1996. Data quality assurance activities in the military health services system. In Proceedings of the 1st International Conference on Information Quality. 127--153.Google ScholarGoogle Scholar
  26. Dasu, T. and Johnson, T. 2003. Exploratory Data Mining and Data cleaning. Probability and Statistics series, John Wiley. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Data Warehousing Institute. 2006. Data quality and the bottom line: Achieving business success through a commitment to high quality data. http://www.dw-institute.com/.Google ScholarGoogle Scholar
  28. De Amicis, F., Barone, D., and Batini, C. 2006. An analytical framework to analyze dependencies among data quality dimensions. In Proceedings of the 11th International Conference on Information Quality (ICIQ). 369--383.Google ScholarGoogle Scholar
  29. De Amicis, F. and Batini, C. 2004. A methodology for data quality assessment on financial data. Studies Commun. Sci. SCKM.Google ScholarGoogle Scholar
  30. De Michelis, G., Dubois, E., Jarke, M., Matthes, F., Mylopoulos, J., Papazoglou, M., Pohl, K., Schmidt, J., Woo, C., and Yu, E. 1997. Cooperative Information Systems: A Manifesto. In Cooperative Information Systems: Trends & Directions, M. Papazoglou and G. Schlageter, Eds. Academic-Press.Google ScholarGoogle Scholar
  31. De Santis, L., Scannapieco, M., and Catarci, T. 2003. Trusting data quality in cooperative information systems. In Proceedings of the 11th International Conference on Cooperative Information Systems (CoopIS). Catania, Italy.Google ScholarGoogle Scholar
  32. Dedeke, A. 2005. Building quality into the information supply chain. Advances in Management Information Systems-Information Quality Monograph (AMIS-IQ) Monograph. R. Wang, E. Pierce, S. Madnick, and Fisher C.W., Eds.Google ScholarGoogle Scholar
  33. DQI. 2004. Data quality initiative framework. Project report. www.wales.nhs.uk/sites/documents/319/DQI_Framwork_Update_Letter_160604.pdfGoogle ScholarGoogle Scholar
  34. English, L. 1999. Improving Data Warehouse and Business Information Quality. Wiley & Sons. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. English, L. 2002. Process management and information quality: how improving information production processes improved information (product) quality. In Proceedings of the 7th International Conference on Information Quality (ICIQ). 206--211.Google ScholarGoogle Scholar
  36. Eppler, M. and Helfert, M. 2004. A classification and analysis of data quality costs. In Proceedings of the 9th International Conference on Information Systems (ICIQ).Google ScholarGoogle Scholar
  37. Eppler, M. and Münzenmaier, P. 2002. Measuring information quality in the Web context: A survey of state-of-the-art instruments and an application methodology. In Proceedings of the 7th International Conference on Information Systems (ICIQ).Google ScholarGoogle Scholar
  38. Falorsi, P., Pallara, S., Pavone, A., Alessandroni, A., Massella, E., and Scannapieco, M. 2003. Improving the quality of toponymic data in the italian public administration. In Proceedings of the ICDT Workshop on Data Quality in Cooperative Information Systems (DQCIS).Google ScholarGoogle Scholar
  39. Fellegi, I. P., and Holt, D. 1976. A systematic approach to automatic edit and imputation. J. Amer. Stat. Assoc. 71, 353, 17--35.Google ScholarGoogle ScholarCross RefCross Ref
  40. Fisher, C. and Kingma, B. 2001. Criticality of data quality as exemplified in two disasters. Inform. Manage. 39, 109--116. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Fraternali, P., Lanzi, P., Matera, M., and Maurino, A. 2004. Model-driven Web usage analysis for the evaluation of Web application quality. J. Web Eng. 3, 2, 124--152. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Gackowski, Z. 2006. Redefining information quality: the operations management approach. In Proceedings of the 11th International Conference on Information Quality (ICIQ). 399--419.Google ScholarGoogle Scholar
  43. Hammer, M. 1990. Reengineering work: Don't automate, obliterate. Harvard Bus. Rev. 104--112.Google ScholarGoogle Scholar
  44. Hammer, M. and Champy, J. 2001. Reengineering the Corporation: A Manifesto for Business Revolution, Harper Collins.Google ScholarGoogle Scholar
  45. Hernandez, M. and Stolfo, S. 1998. Real-world data is dirty: Data cleansing and the merge/purge problem. J. Data Min. Knowl. Dis. 1, 2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Isakowitz, T., Bieber, M., and Vitali, F. 1998. Web information systems - introduction. Commun. ACM 41, 7, 78--80. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Isakowitz, T., Stohr, E., and Balasubramanian, P. 1995. RMM: A methodology for structured hypermedia design. Comm. ACM 58, 8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Istat. 2004. Guidelines for the data quality improvement of localization data in public administration (in Italian). www.istat.itGoogle ScholarGoogle Scholar
  49. Jarke, M., Lenzerini, M., Vassiliou, Y., and Vassiliadis, P., Eds. 1995. Fundamentals of Data Warehouses. Springer Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Jeusfeld, M., Quix, C., and Jarke, M. 1998. Design and analysis of quality information for data warehouses. In Proceedings of the 17th International Conference on Conceptual Modeling. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Kerr, K. and Norris, T. 2004. The development of a healthcare data quality framework and strategy. In Proceedings of the 9th International Conference on Information Quality.Google ScholarGoogle Scholar
  52. Kettinger, W. and Grover, V. 1995. Special section: Toward a theory of business process change management. J. Manag. Inform. Syst. 12, 1, 9--30. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Kovac, R. and Weickert, C. 2002. Starting with quality: Using TDQM in a start-up organization. In Proceedings of the 7th International Conference on Information Quality (ICIQ). Boston, 69--78.Google ScholarGoogle Scholar
  54. Lee, Y. W., Strong, D. M., Kahn, B. K., and Wang, R. Y. 2002. AIMQ: A methodology for information quality assessment. Inform. Manage. 40, 2, 133--460. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Lenzerini, M. 2002. Data integration: A theoretical perspective. In Proceedings of the 21st ACM Symposium on Principles of Database Systems (PODS). Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Liu, L. and Chi, L. 2002. Evolutionary data quality. In Proceedings of the 7th International Conference on Information Quality.Google ScholarGoogle Scholar
  57. Long, J. and Seko, C. April 2005. A cyclic-hierarchical method for database data-quality evaluation and improvement. In Advances in Management Information Systems-Information Quality Monograph (AMIS-IQ) Monograph, R. Wang, E. Pierce, S. Madnick, and Fisher C.W.Google ScholarGoogle Scholar
  58. Loshin, D. 2004. Enterprise Knowledge Management - The Data Quality Approach. Series in Data Management Systems, Morgan Kaufmann, chapter 4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Lyman, P. and Varian, H. R. 2003. How much information. http://www.sims.berkeley.edu/how-much-info-2003.Google ScholarGoogle Scholar
  60. Mecca, G., Atzeni, P., Masci, M., Merialdo, P., and Sindoni, G. 1998. The Araneus Web-based management system. In Proceedings of the ACM SIGMOD International Conference on Management of Data, L. M. Haas and A. Tiwary, Eds. ACM Press, 544--546. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Mecca, G., Merialdo, P., Atzeni, P., and Crescenzi, V. 1999. The (short) araeneus guide to Web site development. In Proceedings of the 2nd International Workshop on the Web and Databases (WebDB) Conjunction with Sigmod.Google ScholarGoogle Scholar
  62. Motro, A. and Anokhin, P. 2005. Fusionplex: Resolution of data inconsistencies in the data integration of heterogeneous information sources. Inform. Fusion, 7, 2, 176--196. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Muthu, S., Withman, L., and Cheraghi, S. H. 1999. Business process re-engineering : a consolidated methodology. In Proceedings of the 4th annual International Conference on Industrial Engineering Theory, Applications and Practice.Google ScholarGoogle Scholar
  64. Nadkarni, P. 2006. Delivering data on time: The assurant health case. In Proceedings of the 11th International Conference on Information Quality.Google ScholarGoogle Scholar
  65. Naumann, F. 2002. Quality-driven query answering for integrated information systems. Lecture Notes in Computer Science, vol. 2261. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Nelson, J., Poels, G., Genero, M., and Piattini, Eds. 2003. Proceedings of the 2nd International Workshop on Conceptual Modeling Quality (IWCMQ). Lecture Notes in Computer Science, vol. 2814, Springer.Google ScholarGoogle Scholar
  67. Oakland, J. 1989. Total Quality Management. Springer.Google ScholarGoogle Scholar
  68. Office of Management and Budget. 2006. Information quality guidelines for ensuring and maximizing the quality, objectivity, utility, and integrity of information disseminated by agencies. http://www.whitehouse.gov/omb/fedreg/reproducible.html.Google ScholarGoogle Scholar
  69. Pernici, B. and Scannapieco, M. 2003. Data quality in Web information systems. J. Data Semant. 1, 48--68.Google ScholarGoogle ScholarCross RefCross Ref
  70. Pipino, L., Lee, Y., and Wang, R. 2002. Data quality assessment. Commun. ACM 45, 4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. Raghunathan, S. 1999. Impact of information quality and decision-maker quality on decision quality: a theoretical model and simulation analysis. Decis. Supp. Syst. 26, 275--286. Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Rahm, E., Thor, A., Aumüller, D., Hong-Hai, D., Golovin, N., and Kirsten, T. June 2005. iFuice information fusion utilizing instance correspondences and peer mappings. In Proceedings of the 8th International Workshop on the Web and Databases (WebDB). located with SIGMOD.Google ScholarGoogle Scholar
  73. Rao, R. 2003. From unstructured data to actionable intelligence. IT Professional 535, 6, 29--35. Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. Redman, T. 1996. Data Quality for the Information Age. Artech House. Google ScholarGoogle ScholarDigital LibraryDigital Library
  75. Redman, T. 1998. The impact of poor data quality on the typical enteprise. Comm. ACM 41, 2, 79--82. Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. Scannapieco, M., A.Virgillito, Marchetti, M., Mecella, M., and Baldoni, R. 2004. The DaQuinCIS architecture: a platform for exchanging and improving data quality in Cooperative Information Systems. Inform. Syst. 29, 7, 551--582. Google ScholarGoogle ScholarDigital LibraryDigital Library
  77. Scannapieco, M., Pernici, B., and Pierce, E. 2002. IP-UML: Towards a Methodology for Quality Improvement based on the IP-MAP Framework. In Proceedings of the 7th International Conference on Information Quality (ICIQ). Boston.Google ScholarGoogle Scholar
  78. Scannapieco, M., Pernici, B., and Pierce, E. 2005. IP-UML: A methodology for quality improvement-based on IP-MAP and UML. In Information Quality, Advances in Management Information Systems, Information Quality Monograph (AMIS-IQ), R. Wang, E. Pierce, S. Madnik, and C. Fisher, Eds.Google ScholarGoogle Scholar
  79. Sessions, V. 2007. Employing the TDQM methodology: An assessment of the SC SOR. In Proceedings of the 12th International Conference on Information Quality. 519--537.Google ScholarGoogle Scholar
  80. Shankaranarayan, G., Wang, R. Y., and Ziad, M. 2000. Modeling the manufacture of an information product with IP-MAP. In Proceedings of the 6th International Conference on Information Quality (ICIQ 2000). Boston.Google ScholarGoogle Scholar
  81. Shankaranarayanan, G. and Wang, R. 2007. IPMAP: Current state and perspectives. In Proceedings of the 12th International Conference on Information Quality.Google ScholarGoogle Scholar
  82. Sheng, Y. 2003. Exploring the mediating and moderating effects of information quality on firm's endeavour on information systems. In Proceedings of the 8th International Conference on Information Quality 2003 (ICIQ). 344--352.Google ScholarGoogle Scholar
  83. Sheng, Y. and Mykytyn, P. 2002. Information technology investment and firm performance: A perspective of data quality. In Proceedings of the 7th International Conference on Information Quality (ICIQ). DC, 132--141.Google ScholarGoogle Scholar
  84. Stoica, M., Chawat, N., and Shin, N. 2003. An investigation of the methodologies of business process reengineering. In Proceedings of Information Systems Education Conference.Google ScholarGoogle Scholar
  85. Su, Y. and Jin, Z. 2004. A methodology for information quality assessment in the designing and manufacturing processes of mechanical products. In Proceedings of the 9th International Conference on Information Quality (ICIQ). 447--465.Google ScholarGoogle Scholar
  86. US Department of Defense. 1994. Data administration procedures. DoD rep. 8320.1-M.Google ScholarGoogle Scholar
  87. Vassiliadis, P., Vagena, Z., Skiadopoulos, S., Karayannidis, N., and Sellis, T. 2001. ARTKOS: toward the modeling, design, control and execution of ETL processes. Inform. Syst. 26, 537--561. Google ScholarGoogle ScholarDigital LibraryDigital Library
  88. Vermeer, B. 2000. How important is data quality for evaluating the impact of edi on global supply chains. In Proceedings of the 33rd Haway Conference on Systems Sciences. Google ScholarGoogle ScholarDigital LibraryDigital Library
  89. Wand, Y. and Wang, R. 1996. Anchoring data quality dimensions in ontological foundations. Comm. ACM 39, 11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  90. Wang, R. 1998. A product perspective on total data quality management. Comm. ACM 41, 2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  91. Wang, R. and Strong, D. 1996. Beyond accuracy: What data quality means to data consumers. J. Manage. Inform. Syst. 12, 4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  92. World Wide Web Consortium. www.w3.org/WAI/. Web accessibility initiative.Google ScholarGoogle Scholar
  93. Zachman, J. 2006. Zachman institute for framework advancement (ZIFA). www.zifa.com.Google ScholarGoogle Scholar

Index Terms

  1. Methodologies for data quality assessment and improvement

      Recommendations

      Reviews

      Andreas E. Schwald

      Data quality comprises a wide subject area, with a variety of dissimilar issues. It is far from trivial to compile a comprehensive survey of the field. This treatise on data quality assessment and improvement presents 13 methodologies, over 50 pages, and lists 92 references up to the year 2007. It aims to provide a "systematic and comparative description along several dimensions, including phases and steps, strategies and techniques, data quality dimensions, types of data, and types of information systems." The paper may be unsatisfactory and too shallow for an advocate of a particular methodology. However, it can be quite helpful for a quick overview, especially for those who are looking for improvement, implementation advice, or new ideas. It covers a wide field and stimulates the interest of the reader?although, in most cases, a reference is needed to obtain an answer to a specific question or for an in-depth treatment of a topic. This is a noteworthy effort that sums up a great deal of information from a rather heterogeneous field. It covers several publications that might not be available in a library of modest size, thereby bringing this information to the attention of a wider reader community. However, whether quality is in the eyes of an observer or in measurable attributes of an object remains an open question. Online Computing Reviews Service

      Access critical reviews of Computing literature here

      Become a reviewer for Computing Reviews.

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Computing Surveys
        ACM Computing Surveys  Volume 41, Issue 3
        July 2009
        284 pages
        ISSN:0360-0300
        EISSN:1557-7341
        DOI:10.1145/1541880
        Issue’s Table of Contents

        Copyright © 2009 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 30 July 2009
        • Accepted: 1 May 2008
        • Revised: 1 December 2007
        • Received: 1 December 2006
        Published in csur Volume 41, Issue 3

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader