Abstract
Data quality is a critical issue both in operational databases and in data warehouse systems. Data quality assessment is a strong requirement regarding the ETL subsystem, since bad data may destroy data warehouse credibility. During the last two decades, research and development efforts in the data quality field have produced techniques for data profiling and cleaning, which focus on detecting and correcting bad values in data. Little efforts have been done considering data quality when it relates to the well-formedness of coarse grained data structures resulting from the assembly of linked data records. This paper proposes a metadata model that supports the structural validation of linked data records, from a data quality point of view. The metamodel is built on top of the CWM standard and it supports the specification of data structure quality rules in a high level of abstraction, as well as by means of very specific fine grained business rules.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Ackermann, J., Turowski, V.: A Library of OCL Specification Patterns to Simplify Behavioral Specification of Software Components. In: Dubois, E., Pohl, K. (eds.) CAiSE 2006. LNCS, vol. 4001, pp. 255–269. Springer, Heidelberg (2006)
Booch, G., Rumbaugh, J., Jacobson, I.: The Unified Modeling Language User Guide. Addison-Wesley, Reading, MA (1999)
Dasu, T., Johnson, T.: Exploratory Data Mining and Data Cleaning. John Wiley & Sons, Chichester (2003)
Dasu, T., Vesonder, G., Wright, J.: Data quality through knowledge engineering. In: KDD 2003. Proc. 9th ACM SIGKDD, Washington, D.C, pp. 705–710. ACM Press, New York (2003)
Galhardas, H., Florescu, D., Shasha, D., Simon, E.: AJAX: An Extensible Data Cleaning Tool. In: Proc. ACM SIGMOD Conf., Dallas, Texas, p. 590 (2000)
Gomes, P., Farinha, J., Trigueiros, M.J.: A Data Quality Metamodel Extension to CWM. In: Roddick, J.F., Hinze, A. (eds.) APCCM 2007. Proc. 4th Asia-Pacific Conference on Conceptual Modelling, Ballarat, Australia. CRPIT, 67, pp. 17–26. ACS (2007)
Kimball, R., Caserta, J.: The Data Warehouse ETL Toolkit. Wiley Publishing, Inc., Chichester (2004)
Olson, J.: Data Quality: The Accuracy Dimension. Morgan Kaufman, San Francisco (2003)
OMG (ed.): Common Warehouse Metamodel (CWM), Version 1.1, Object Management Group, Inc. (2003), Internet: http://www.omg.org/technology/documents/formal/cwm.htm
OMG (ed.): Object Constraint Language Specification, Version 2.0, Object Management Group, Inc. (2006), Internet: http://www.omg.org/technology/documents/formal/ocl.htm
Raman, V., Hellerstein, J.: Potter’s wheel: An interactive data cleaning system. In: Proc. 27th VLDB, Roma, Italy, pp. 381–390 (2001)
Richters, M., Gogolla, M.: A Metamodel for OCL. In: France, R.B., Rumpe, B. (eds.) UML 1999. LNCS, vol. 1723, pp. 156–171. Springer, Heidelberg (1999)
Wahler, M., Koehler, J., Bruckner, A.: Model-Driven Constraint Engineering. In: Proc. 6th OCL Workshop on OCL for (Meta-)Models (OCLApps 2006)/ MoDELS Conferences, Genova, Italy (2006)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Farinha, J., Trigueiros, M.J. (2007). An Extensible Metadata Framework for Data Quality Assessment of Composite Structures. In: Song, I.Y., Eder, J., Nguyen, T.M. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2007. Lecture Notes in Computer Science, vol 4654. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74553-2_4
Download citation
DOI: https://doi.org/10.1007/978-3-540-74553-2_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74552-5
Online ISBN: 978-3-540-74553-2
eBook Packages: Computer ScienceComputer Science (R0)