Abstract
Automatic detection and removal of inconsistencies in data are open challenges in the data quality management cycle. Specific knowledge is needed to clean invalid data, which often requires user interaction with domain experts. Domain specific classes and attributes can be described in ontologies. Attribute value combinations can be labeled as valid or invalid. Our approach on data cleaning allows for detection and removal of semantic errors in data. The analysis of replacements enables the creation of rules, which can minimize the required user interaction. We provide an algorithm which analyzes frequencies of replacement operations for invalid tuples in the ontology and generates rules, which are then applied in data cleaning environments automatically.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Agrawal, R., Imieliński, T., Swami, A.: Mining association rules between sets of items in large databases. In: SIGMOD 1993: Proceedings of the 1993 ACM SIGMOD international conference on Management of data, pp. 207–216. ACM Press, New York (1993)
Arens, Y., Hsu, C.-N., Knoblock, C.A.: Query processing in the sims information mediator, pp. 82–90 (1998)
Brüggemann, S., Aden, T.: Ontology based data validation and cleaning: Restructuring operations for ontology maintenance. In: Hitzler, P., Sure, Y. (eds.) GI Proceedings 109, Band 1, LNI, GI. vol. 94 (2007)
Brin, S., Motwani, R., Ullman, J.D., Tsur, S.: Dynamic itemset counting and implication rules for market basket data. In: Peckham, J. (ed.) SIGMOD 1997, Proceedings ACM SIGMOD International Conference on Management of Data, Tucson, Arizona, USA, May 13-15, 1997, pp. 255–264. ACM Press, New York (1997)
Gruber, T.R.: A translation approach to portable ontologies. Knowledge Acquisition 5(2), 199–220 (1993)
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: Chen, W., Naughton, J., Bernstein, P.A. (eds.) 2000 ACM SIGMOD Intl. Conference on Management of Data, May 2000, pp. 1–12. ACM Press, New York (2000)
Leser, U., Naumann, F.: Informationsintegration. dpunkt.verlag (2007)
Milano, D., Scannapieco, M., Catarci, T.: Using ontologies for xml data cleaning. In: Meersman, R., Tari, Z., Herrero, P. (eds.) OTM-WS 2005. LNCS, vol. 3762, pp. 562–571. Springer, Heidelberg (2005)
Müller, H., Freytag, J.-C.: Problems, methods, and challenges in comprehensive data cleansing. Technical report, Humboldt University Berlin (2003)
W.H. Organization: ICD 10: International Statistical Classification of Diseases and Related Health Problems, 10th edn. American Psychiatric Association (1992)
Rahm, E., Do, H.H.: Data cleaning: Problems and current approaches. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering 23(4), 3–13 (2000)
Savasere, A., Omiecinski, E., Navathe, S.B.: An efficient algorithm for mining association rules in large databases. In: VLDB 1995: Proceedings of the 21th International Conference on Very Large Data Bases, pp. 432–444. Morgan Kaufmann Publishers Inc., San Francisco (1995)
Wang, X., Hamilton, H.J., Bither, Y.: An ontology-based approach to data cleaning. Technical report, Department of Computer Science, University of Regina (June 2005)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Brüggemann, S. (2008). Rule Mining for Automatic Ontology Based Data Cleaning. In: Zhang, Y., Yu, G., Bertino, E., Xu, G. (eds) Progress in WWW Research and Development. APWeb 2008. Lecture Notes in Computer Science, vol 4976. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78849-2_52
Download citation
DOI: https://doi.org/10.1007/978-3-540-78849-2_52
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78848-5
Online ISBN: 978-3-540-78849-2
eBook Packages: Computer ScienceComputer Science (R0)