Abstract
The literature provides a wide range of techniques to assess and improve the quality of data. Due to the diversity and complexity of these techniques, research has recently focused on defining methodologies that help the selection, customization, and application of data quality assessment and improvement techniques. The goal of this article is to provide a systematic and comparative description of such methodologies. Methodologies are compared along several dimensions, including the methodological phases and steps, the strategies and techniques, the data quality dimensions, the types of data, and, finally, the types of information systems addressed by each methodology. The article concludes with a summary description of each methodology.
- Abiteboul, S., Buneman, P., and Suciu, D. 2000. Data on the Web: From Relations to Semistructured Data and XML. Morgan Kaufmann Publishers. Google ScholarDigital Library
- Aiken, P. 1996. Data Reverse Engineering. McGraw Hill.Google Scholar
- Arenas, M., Bertossi, L., and Chomicki, J. 1999. Consistent query answers in inconsistent databases. In Proceedings of the 18th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS). ACM, New York, 68--79. Google ScholarDigital Library
- Atzeni, P. and Antonellis, V. D. 1993. Relational Database Theory. Benjamin/Cummings. Google ScholarDigital Library
- Atzeni, P., Merialdo, P., and Sindoni, G. 2001. Web site evaluation: Methodology and case study. In Proceedings of International Workshop on data Semantics in Web Information Systems (DASWIS). Google ScholarDigital Library
- Ballou, D. and Pazer, H. 1985. Modeling data and process quality in multi-input, multi-output information systems. Manag. Sci. 31, 2.Google ScholarDigital Library
- Ballou, D., Wang, R., Pazer, H., and Tayi, G. 1998. Modeling information manufacturing systems to determine information product quality. Manage. Sci. 44, 4. Google ScholarDigital Library
- Basile, A., Batini, C., Grega, S., Mastrella, M., and Maurino, A. 2007. Orme: A new methodology for information quality and basel II operational risk. In Proceeedings of the 12th International Conference of Information Quality, Industrial Track.Google Scholar
- Basili, V., Caldiera, C., Rombach, H. 1994. Goal question metric paradigm.Google Scholar
- Baskarada, S., Koronios, A., and Gao, J. 2006. Towards a capability maturity model for information quality management: a tdqm approach. In Proceedings of the 11th International Conference on Information Quality.Google Scholar
- Batini, C., Cabitza, F., Cappiello, C., and Francalanci, C. 2008. A comprehensive data quality methodology for Web and structured data. Int. J. Innov. Comput. Appl. 1, 3, 205--218. Google ScholarDigital Library
- Batini, C. and Scannapieco, M. 2006. Data Quality: Concepts, Methodologies and Techniques. Springer Verlag. Google ScholarDigital Library
- Bertolazzi, P., Santis, L. D., and Scannapieco, M. 2003. Automatic record matching in cooperative information systems. In Proceedings of the ICDT International Workshop on Data Quality in Cooperative Information Systems (DQCIS).Google Scholar
- Bettschen, P. 2005. Master data management (MDM) enables IQ at Tetra Pak. In Proceedings of the 10th International Conference on Information Quality.Google Scholar
- Bilke, A., Bleiholder, J., Böhm, C., Draba, K., Naumann, F., and Weis, M. September 2005. Automatic data fusion with HumMer. In Proceedings of the VLDB Demonstration Program. Google ScholarDigital Library
- Bovee, M., Srivastava, R., and Mak, B. September 2001. A conceptual framework and belief-function approach to assessing overall information quality. In Proceedings of the 6th International Conference on Information Quality.Google Scholar
- Buneman, P. 1997. Semi-structured data. In Proceedings of the 16th ACM Symposium on Principles of Database Systems (PODS). Google ScholarDigital Library
- Calì, A., Calvanese, D., De Giacomo, G., and Lenzerini, M. 2004. Data integration under integrity constraints. Inform. Syst. 29, 2, 147--163. Google ScholarDigital Library
- Calvanese, D., De Giacomo, D., and Lenzerini, M. 1999. Modeling and querying semi-structured data. Network. Inform. Syst. J. 2, 2, 253--273.Google Scholar
- Cappiello, C., Francalanci, C., and Pernici, B. 2003. Preserving Web sites: A data quality approach. In Proceedings of the 7th International Conference on Information Quality (ICIQ).Google Scholar
- Cappiello, C., Francalanci, C., Pernici, B., Plebani, P., and Scannapieco, M. 2003b. Data quality assurance in cooperative information systems: a multi-dimension certificate. In Proceedings of the ICDT International Workshop on Data Quality in Cooperative Information Systems (DQCIS).Google Scholar
- Catarci, T., and Scannapieco, M. 2002. Data quality under the computer science perspective. Archivi Computer 2.Google Scholar
- Chapman, A., Richards, H., and Hawken, S. 2006. Data and information quality at the Canadian institute for health information. In Proceedings of the 11th International Conference on Information Quality.Google Scholar
- Chengalur-Smith, I. N., Ballou, D. P., and Pazer, H. L. 1999. The impact of data quality information on decision making: An exploratory analysis. IEEE Trans. Knowl. Data Eng. 11, 6, 853--864. Google ScholarDigital Library
- Corey, D., Cobler, L., Haynes, K., and Walker, R. 1996. Data quality assurance activities in the military health services system. In Proceedings of the 1st International Conference on Information Quality. 127--153.Google Scholar
- Dasu, T. and Johnson, T. 2003. Exploratory Data Mining and Data cleaning. Probability and Statistics series, John Wiley. Google ScholarDigital Library
- Data Warehousing Institute. 2006. Data quality and the bottom line: Achieving business success through a commitment to high quality data. http://www.dw-institute.com/.Google Scholar
- De Amicis, F., Barone, D., and Batini, C. 2006. An analytical framework to analyze dependencies among data quality dimensions. In Proceedings of the 11th International Conference on Information Quality (ICIQ). 369--383.Google Scholar
- De Amicis, F. and Batini, C. 2004. A methodology for data quality assessment on financial data. Studies Commun. Sci. SCKM.Google Scholar
- De Michelis, G., Dubois, E., Jarke, M., Matthes, F., Mylopoulos, J., Papazoglou, M., Pohl, K., Schmidt, J., Woo, C., and Yu, E. 1997. Cooperative Information Systems: A Manifesto. In Cooperative Information Systems: Trends & Directions, M. Papazoglou and G. Schlageter, Eds. Academic-Press.Google Scholar
- De Santis, L., Scannapieco, M., and Catarci, T. 2003. Trusting data quality in cooperative information systems. In Proceedings of the 11th International Conference on Cooperative Information Systems (CoopIS). Catania, Italy.Google Scholar
- Dedeke, A. 2005. Building quality into the information supply chain. Advances in Management Information Systems-Information Quality Monograph (AMIS-IQ) Monograph. R. Wang, E. Pierce, S. Madnick, and Fisher C.W., Eds.Google Scholar
- DQI. 2004. Data quality initiative framework. Project report. www.wales.nhs.uk/sites/documents/319/DQI_Framwork_Update_Letter_160604.pdfGoogle Scholar
- English, L. 1999. Improving Data Warehouse and Business Information Quality. Wiley & Sons. Google ScholarDigital Library
- English, L. 2002. Process management and information quality: how improving information production processes improved information (product) quality. In Proceedings of the 7th International Conference on Information Quality (ICIQ). 206--211.Google Scholar
- Eppler, M. and Helfert, M. 2004. A classification and analysis of data quality costs. In Proceedings of the 9th International Conference on Information Systems (ICIQ).Google Scholar
- Eppler, M. and Münzenmaier, P. 2002. Measuring information quality in the Web context: A survey of state-of-the-art instruments and an application methodology. In Proceedings of the 7th International Conference on Information Systems (ICIQ).Google Scholar
- Falorsi, P., Pallara, S., Pavone, A., Alessandroni, A., Massella, E., and Scannapieco, M. 2003. Improving the quality of toponymic data in the italian public administration. In Proceedings of the ICDT Workshop on Data Quality in Cooperative Information Systems (DQCIS).Google Scholar
- Fellegi, I. P., and Holt, D. 1976. A systematic approach to automatic edit and imputation. J. Amer. Stat. Assoc. 71, 353, 17--35.Google ScholarCross Ref
- Fisher, C. and Kingma, B. 2001. Criticality of data quality as exemplified in two disasters. Inform. Manage. 39, 109--116. Google ScholarDigital Library
- Fraternali, P., Lanzi, P., Matera, M., and Maurino, A. 2004. Model-driven Web usage analysis for the evaluation of Web application quality. J. Web Eng. 3, 2, 124--152. Google ScholarDigital Library
- Gackowski, Z. 2006. Redefining information quality: the operations management approach. In Proceedings of the 11th International Conference on Information Quality (ICIQ). 399--419.Google Scholar
- Hammer, M. 1990. Reengineering work: Don't automate, obliterate. Harvard Bus. Rev. 104--112.Google Scholar
- Hammer, M. and Champy, J. 2001. Reengineering the Corporation: A Manifesto for Business Revolution, Harper Collins.Google Scholar
- Hernandez, M. and Stolfo, S. 1998. Real-world data is dirty: Data cleansing and the merge/purge problem. J. Data Min. Knowl. Dis. 1, 2. Google ScholarDigital Library
- Isakowitz, T., Bieber, M., and Vitali, F. 1998. Web information systems - introduction. Commun. ACM 41, 7, 78--80. Google ScholarDigital Library
- Isakowitz, T., Stohr, E., and Balasubramanian, P. 1995. RMM: A methodology for structured hypermedia design. Comm. ACM 58, 8. Google ScholarDigital Library
- Istat. 2004. Guidelines for the data quality improvement of localization data in public administration (in Italian). www.istat.itGoogle Scholar
- Jarke, M., Lenzerini, M., Vassiliou, Y., and Vassiliadis, P., Eds. 1995. Fundamentals of Data Warehouses. Springer Verlag. Google ScholarDigital Library
- Jeusfeld, M., Quix, C., and Jarke, M. 1998. Design and analysis of quality information for data warehouses. In Proceedings of the 17th International Conference on Conceptual Modeling. Google ScholarDigital Library
- Kerr, K. and Norris, T. 2004. The development of a healthcare data quality framework and strategy. In Proceedings of the 9th International Conference on Information Quality.Google Scholar
- Kettinger, W. and Grover, V. 1995. Special section: Toward a theory of business process change management. J. Manag. Inform. Syst. 12, 1, 9--30. Google ScholarDigital Library
- Kovac, R. and Weickert, C. 2002. Starting with quality: Using TDQM in a start-up organization. In Proceedings of the 7th International Conference on Information Quality (ICIQ). Boston, 69--78.Google Scholar
- Lee, Y. W., Strong, D. M., Kahn, B. K., and Wang, R. Y. 2002. AIMQ: A methodology for information quality assessment. Inform. Manage. 40, 2, 133--460. Google ScholarDigital Library
- Lenzerini, M. 2002. Data integration: A theoretical perspective. In Proceedings of the 21st ACM Symposium on Principles of Database Systems (PODS). Google ScholarDigital Library
- Liu, L. and Chi, L. 2002. Evolutionary data quality. In Proceedings of the 7th International Conference on Information Quality.Google Scholar
- Long, J. and Seko, C. April 2005. A cyclic-hierarchical method for database data-quality evaluation and improvement. In Advances in Management Information Systems-Information Quality Monograph (AMIS-IQ) Monograph, R. Wang, E. Pierce, S. Madnick, and Fisher C.W.Google Scholar
- Loshin, D. 2004. Enterprise Knowledge Management - The Data Quality Approach. Series in Data Management Systems, Morgan Kaufmann, chapter 4. Google ScholarDigital Library
- Lyman, P. and Varian, H. R. 2003. How much information. http://www.sims.berkeley.edu/how-much-info-2003.Google Scholar
- Mecca, G., Atzeni, P., Masci, M., Merialdo, P., and Sindoni, G. 1998. The Araneus Web-based management system. In Proceedings of the ACM SIGMOD International Conference on Management of Data, L. M. Haas and A. Tiwary, Eds. ACM Press, 544--546. Google ScholarDigital Library
- Mecca, G., Merialdo, P., Atzeni, P., and Crescenzi, V. 1999. The (short) araeneus guide to Web site development. In Proceedings of the 2nd International Workshop on the Web and Databases (WebDB) Conjunction with Sigmod.Google Scholar
- Motro, A. and Anokhin, P. 2005. Fusionplex: Resolution of data inconsistencies in the data integration of heterogeneous information sources. Inform. Fusion, 7, 2, 176--196. Google ScholarDigital Library
- Muthu, S., Withman, L., and Cheraghi, S. H. 1999. Business process re-engineering : a consolidated methodology. In Proceedings of the 4th annual International Conference on Industrial Engineering Theory, Applications and Practice.Google Scholar
- Nadkarni, P. 2006. Delivering data on time: The assurant health case. In Proceedings of the 11th International Conference on Information Quality.Google Scholar
- Naumann, F. 2002. Quality-driven query answering for integrated information systems. Lecture Notes in Computer Science, vol. 2261. Google ScholarDigital Library
- Nelson, J., Poels, G., Genero, M., and Piattini, Eds. 2003. Proceedings of the 2nd International Workshop on Conceptual Modeling Quality (IWCMQ). Lecture Notes in Computer Science, vol. 2814, Springer.Google Scholar
- Oakland, J. 1989. Total Quality Management. Springer.Google Scholar
- Office of Management and Budget. 2006. Information quality guidelines for ensuring and maximizing the quality, objectivity, utility, and integrity of information disseminated by agencies. http://www.whitehouse.gov/omb/fedreg/reproducible.html.Google Scholar
- Pernici, B. and Scannapieco, M. 2003. Data quality in Web information systems. J. Data Semant. 1, 48--68.Google ScholarCross Ref
- Pipino, L., Lee, Y., and Wang, R. 2002. Data quality assessment. Commun. ACM 45, 4. Google ScholarDigital Library
- Raghunathan, S. 1999. Impact of information quality and decision-maker quality on decision quality: a theoretical model and simulation analysis. Decis. Supp. Syst. 26, 275--286. Google ScholarDigital Library
- Rahm, E., Thor, A., Aumüller, D., Hong-Hai, D., Golovin, N., and Kirsten, T. June 2005. iFuice information fusion utilizing instance correspondences and peer mappings. In Proceedings of the 8th International Workshop on the Web and Databases (WebDB). located with SIGMOD.Google Scholar
- Rao, R. 2003. From unstructured data to actionable intelligence. IT Professional 535, 6, 29--35. Google ScholarDigital Library
- Redman, T. 1996. Data Quality for the Information Age. Artech House. Google ScholarDigital Library
- Redman, T. 1998. The impact of poor data quality on the typical enteprise. Comm. ACM 41, 2, 79--82. Google ScholarDigital Library
- Scannapieco, M., A.Virgillito, Marchetti, M., Mecella, M., and Baldoni, R. 2004. The DaQuinCIS architecture: a platform for exchanging and improving data quality in Cooperative Information Systems. Inform. Syst. 29, 7, 551--582. Google ScholarDigital Library
- Scannapieco, M., Pernici, B., and Pierce, E. 2002. IP-UML: Towards a Methodology for Quality Improvement based on the IP-MAP Framework. In Proceedings of the 7th International Conference on Information Quality (ICIQ). Boston.Google Scholar
- Scannapieco, M., Pernici, B., and Pierce, E. 2005. IP-UML: A methodology for quality improvement-based on IP-MAP and UML. In Information Quality, Advances in Management Information Systems, Information Quality Monograph (AMIS-IQ), R. Wang, E. Pierce, S. Madnik, and C. Fisher, Eds.Google Scholar
- Sessions, V. 2007. Employing the TDQM methodology: An assessment of the SC SOR. In Proceedings of the 12th International Conference on Information Quality. 519--537.Google Scholar
- Shankaranarayan, G., Wang, R. Y., and Ziad, M. 2000. Modeling the manufacture of an information product with IP-MAP. In Proceedings of the 6th International Conference on Information Quality (ICIQ 2000). Boston.Google Scholar
- Shankaranarayanan, G. and Wang, R. 2007. IPMAP: Current state and perspectives. In Proceedings of the 12th International Conference on Information Quality.Google Scholar
- Sheng, Y. 2003. Exploring the mediating and moderating effects of information quality on firm's endeavour on information systems. In Proceedings of the 8th International Conference on Information Quality 2003 (ICIQ). 344--352.Google Scholar
- Sheng, Y. and Mykytyn, P. 2002. Information technology investment and firm performance: A perspective of data quality. In Proceedings of the 7th International Conference on Information Quality (ICIQ). DC, 132--141.Google Scholar
- Stoica, M., Chawat, N., and Shin, N. 2003. An investigation of the methodologies of business process reengineering. In Proceedings of Information Systems Education Conference.Google Scholar
- Su, Y. and Jin, Z. 2004. A methodology for information quality assessment in the designing and manufacturing processes of mechanical products. In Proceedings of the 9th International Conference on Information Quality (ICIQ). 447--465.Google Scholar
- US Department of Defense. 1994. Data administration procedures. DoD rep. 8320.1-M.Google Scholar
- Vassiliadis, P., Vagena, Z., Skiadopoulos, S., Karayannidis, N., and Sellis, T. 2001. ARTKOS: toward the modeling, design, control and execution of ETL processes. Inform. Syst. 26, 537--561. Google ScholarDigital Library
- Vermeer, B. 2000. How important is data quality for evaluating the impact of edi on global supply chains. In Proceedings of the 33rd Haway Conference on Systems Sciences. Google ScholarDigital Library
- Wand, Y. and Wang, R. 1996. Anchoring data quality dimensions in ontological foundations. Comm. ACM 39, 11. Google ScholarDigital Library
- Wang, R. 1998. A product perspective on total data quality management. Comm. ACM 41, 2. Google ScholarDigital Library
- Wang, R. and Strong, D. 1996. Beyond accuracy: What data quality means to data consumers. J. Manage. Inform. Syst. 12, 4. Google ScholarDigital Library
- World Wide Web Consortium. www.w3.org/WAI/. Web accessibility initiative.Google Scholar
- Zachman, J. 2006. Zachman institute for framework advancement (ZIFA). www.zifa.com.Google Scholar
Index Terms
- Methodologies for data quality assessment and improvement
Recommendations
Data Quality Assessment and Improvement
As the Vrije Universiteit Brussel switched from an in-house built CRIS to Pure, a large number of data quality issues were discovered. In order to solve these, a large-scale data quality assessment and improvement program was started. The assessment ...
Data quality assessment: The Hybrid Approach
Various techniques have been proposed to enable organisations to assess the current quality level of their data. Unfortunately, organisations have many different requirements related to data quality (DQ) assessment. For example, some organisations may ...
Requirements for Data Quality Metrics
Challenge Paper, Experience Paper and Research PaperData quality and especially the assessment of data quality have been intensively discussed in research and practice alike. To support an economically oriented management of data quality and decision making under uncertainty, it is essential to assess ...
Comments