Skip to main content

Using Ontologies Providing Domain Knowledge for Data Quality Management

  • Chapter
Networked Knowledge - Networked Media

Part of the book series: Studies in Computational Intelligence ((SCI,volume 221))

Abstract

Several data quality management (DQM) tasks like duplicate detection or consistency checking depend on domain specific knowledge. Many DQM approaches have potential for bringing together domain knowledge and DQM metadata. We provide an approach which uses this knowledge modeled in ontologies instead of aquiring that knowledge by cost-intensive interviews with domain-experts. These ontologies can directly be annotated with DQM specific metadata. With our approach a synergy effect can be achieved when modeling a domain ontology, e.g. for defining a shared vocabulary for improved interoperability, and performing DQM. We present five DQM applications which directly use knowledge provided by domain ontologies. These applications use the ontology structure itself to provide correction suggestions for invalid data, identify duplicates, and to store data quality annotations at schema and instance level.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Amicis, F.D., Batini, C.: A methodology for data quality assessment on financial data. Studies in Communication Sciences 4, 115–136 (2004)

    Google Scholar 

  2. Batini, C., Scannapieco, M.: Data Quality. Springer, Heidelberg (2006)

    MATH  Google Scholar 

  3. Bilenko, M., Mooney, J.R.: Employing trainable string metrics for information integration. In: Proceedings of the IJCAI-2003 Workshop on Information Integration on the Web, Acapulco, Mexico, pp. 67–72 (August 2003)

    Google Scholar 

  4. Brüggemann, S.: Rule mining for automatic ontology based data cleaning. In: Zhang, Y., Yu, G., Bertino, E., Xu, G. (eds.) APWeb 2008. LNCS, vol. 4976, pp. 522–527. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  5. Brüggemann, S.: Proaktives Management von Konsistenzbedingungen im Analytischen Performance Management. In: Proceedings of DW 2008, Synergien durch Integration and Informationslogistik (2008)

    Google Scholar 

  6. Fellegi, I.P., Holt, D.: A systematic approach to automatic edit and imputation. Journal of the American Statistcal Association 71, 17–35 (1976)

    Article  Google Scholar 

  7. Fellegi, I.P., Sunter, A.B.: A theory for record linkage. Journal of the American Statistical Association 64(328), 1183–1210 (1969)

    Article  Google Scholar 

  8. Gebben, T.: OWL-Reasoner basierte Gültigkeitsprüfung von CIM-Topologien gemäßCommon Power System Model (CPSM). Master thesis, Universität Oldenburg (to be published) (2009)

    Google Scholar 

  9. Hinrichs, H.: Datenqualitätsmanagement in Data Warehouse-Systemen. PhD thesis, Universität Oldenburg (2002)

    Google Scholar 

  10. IEC - International Electrotechnical Commission: IEC 61970:301: Energy management system application program interface (EMS-API) - Part 301: Common Information Model (CIM) Base. International Electrotechnical Commission (2003)

    Google Scholar 

  11. IEC - International Electrotechnical Commission: IEC 61970: Energy Management System Application Program Interface (EMS-API) - Part 452: CIM Network Applications Model Exchange Specification. International Electrotechnical Commission (2006)

    Google Scholar 

  12. International Union Against Cancer (UICC). TNM Classification of Malignant Tumours, 6th edn. John Wiley & Sons, New Jersey (2001)

    Google Scholar 

  13. Kedad, Z., Métais, E.: Ontology-based data cleaning. In: Andersson, B., Bergholtz, M., Johannesson, P. (eds.) NLDB 2002. LNCS, vol. 2553, pp. 137–149. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  14. Microsoft Corporation: Domain Specific Language Tools, http://msdn2.microsoft.com/en-us/vstudio/aa718368.aspx/ (Feburary 12, 2009)

  15. Milano, D., Scannapieco, M., Catarci, T.: Using ontologies for xml data cleaning. In: Meersman, R., Tari, Z., Herrero, P. (eds.) OTM-WS 2005. LNCS, vol. 3762, pp. 562–571. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  16. Rahm, E., Do, H.H.: Data cleaning: Problems and current approaches. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering 23(4), 3–13 (2000)

    Google Scholar 

  17. Schünemann, M.: Duplikatenerkennung in Datensätzen mithilfe selbstlernender Algorithmen. Master thesis, Universität Oldenburg (2007)

    Google Scholar 

  18. Sirin, E., Parsia, B., Grau, B.C., Kalyanpur, A., Katz, Y.: Pellet: A practical OWL-DL reasoner. Web Semantics: Science, Services and Agents on the World Wide Web 5(2), 51–53 (2007)

    Article  Google Scholar 

  19. Uslar, M., Grüning, F.: Zur semantischen Interoperabilität in der Energiebranche: CIM IEC 61970. Wirtschaftsinformatik 49(4), 295–303 (2007)

    Article  Google Scholar 

  20. Wang, X., Hamilton, H.J., Bither, Y.: An ontology-based approach to data cleaning. Technical report, Department of Computer Science, University of Regina (June 2005)

    Google Scholar 

  21. Wietek, F.: Intelligente Analyse multidimensionaler Daten in einer visuellen Programmierumgebung und deren Anwendung in der Krebsepidemiologie. PhD thesis, Universität Oldenburg (2000)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Brüggemann, S., Grüning, F. (2009). Using Ontologies Providing Domain Knowledge for Data Quality Management. In: Pellegrini, T., Auer, S., Tochtermann, K., Schaffert, S. (eds) Networked Knowledge - Networked Media. Studies in Computational Intelligence, vol 221. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02184-8_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-02184-8_13

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-02183-1

  • Online ISBN: 978-3-642-02184-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics