Rule Mining for Automatic Ontology Based Data Cleaning

Brüggemann, Stefan

doi:10.1007/978-3-540-78849-2_52

Stefan Brüggemann¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4976))

Included in the following conference series:

Asia-Pacific Web Conference

906 Accesses
6 Citations

Abstract

Automatic detection and removal of inconsistencies in data are open challenges in the data quality management cycle. Specific knowledge is needed to clean invalid data, which often requires user interaction with domain experts. Domain specific classes and attributes can be described in ontologies. Attribute value combinations can be labeled as valid or invalid. Our approach on data cleaning allows for detection and removal of semantic errors in data. The analysis of replacements enables the creation of rules, which can minimize the required user interaction. We provide an algorithm which analyzes frequencies of replacement operations for invalid tuples in the ontology and generates rules, which are then applied in data cleaning environments automatically.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agrawal, R., Imieliński, T., Swami, A.: Mining association rules between sets of items in large databases. In: SIGMOD 1993: Proceedings of the 1993 ACM SIGMOD international conference on Management of data, pp. 207–216. ACM Press, New York (1993)
Chapter Google Scholar
Arens, Y., Hsu, C.-N., Knoblock, C.A.: Query processing in the sims information mediator, pp. 82–90 (1998)
Google Scholar
Brüggemann, S., Aden, T.: Ontology based data validation and cleaning: Restructuring operations for ontology maintenance. In: Hitzler, P., Sure, Y. (eds.) GI Proceedings 109, Band 1, LNI, GI. vol. 94 (2007)
Google Scholar
Brin, S., Motwani, R., Ullman, J.D., Tsur, S.: Dynamic itemset counting and implication rules for market basket data. In: Peckham, J. (ed.) SIGMOD 1997, Proceedings ACM SIGMOD International Conference on Management of Data, Tucson, Arizona, USA, May 13-15, 1997, pp. 255–264. ACM Press, New York (1997)
Chapter Google Scholar
Gruber, T.R.: A translation approach to portable ontologies. Knowledge Acquisition 5(2), 199–220 (1993)
Article Google Scholar
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: Chen, W., Naughton, J., Bernstein, P.A. (eds.) 2000 ACM SIGMOD Intl. Conference on Management of Data, May 2000, pp. 1–12. ACM Press, New York (2000)
Chapter Google Scholar
Leser, U., Naumann, F.: Informationsintegration. dpunkt.verlag (2007)
Google Scholar
Milano, D., Scannapieco, M., Catarci, T.: Using ontologies for xml data cleaning. In: Meersman, R., Tari, Z., Herrero, P. (eds.) OTM-WS 2005. LNCS, vol. 3762, pp. 562–571. Springer, Heidelberg (2005)
Chapter Google Scholar
Müller, H., Freytag, J.-C.: Problems, methods, and challenges in comprehensive data cleansing. Technical report, Humboldt University Berlin (2003)
Google Scholar
W.H. Organization: ICD 10: International Statistical Classification of Diseases and Related Health Problems, 10th edn. American Psychiatric Association (1992)
Google Scholar
Rahm, E., Do, H.H.: Data cleaning: Problems and current approaches. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering 23(4), 3–13 (2000)
Google Scholar
Savasere, A., Omiecinski, E., Navathe, S.B.: An efficient algorithm for mining association rules in large databases. In: VLDB 1995: Proceedings of the 21th International Conference on Very Large Data Bases, pp. 432–444. Morgan Kaufmann Publishers Inc., San Francisco (1995)
Google Scholar
Wang, X., Hamilton, H.J., Bither, Y.: An ontology-based approach to data cleaning. Technical report, Department of Computer Science, University of Regina (June 2005)
Google Scholar

Download references

Author information

Authors and Affiliations

OFFIS - Institute for Information Technology, Escherweg 2, 26121, Oldenburg, Germany
Stefan Brüggemann

Authors

Stefan Brüggemann
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Yanchun Zhang Ge Yu Elisa Bertino Guandong Xu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Brüggemann, S. (2008). Rule Mining for Automatic Ontology Based Data Cleaning. In: Zhang, Y., Yu, G., Bertino, E., Xu, G. (eds) Progress in WWW Research and Development. APWeb 2008. Lecture Notes in Computer Science, vol 4976. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78849-2_52

Download citation

DOI: https://doi.org/10.1007/978-3-540-78849-2_52
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78848-5
Online ISBN: 978-3-540-78849-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics