Skip to main content

An Uncertain Data Integration System

  • Conference paper
On the Move to Meaningful Internet Systems: OTM 2012 (OTM 2012)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7566))

  • 986 Accesses

Abstract

Data integration systems offer uniform access to a set of autonomous and heterogeneous data sources. An important task in setting up a data integration system is to match the attributes of the source schemas. In this paper, we propose a data integration system which uses the knowledge implied within functional dependencies for matching the source schemas. We build our system on a probabilistic data model to capture the uncertainty arising during the matching process. Our performance validation confirms the importance of functional dependencies and also using a probabilistic data model in improving the quality of schema matching. Our experimental results show significant performance gain compared to the baseline approaches. They also show that our system scales well.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Özsu, M.T., Valduriez, P.: Principles of Distributed Database Systems, 3rd edn. Springer (2011)

    Google Scholar 

  2. Madhavan, J., Cohen, S., Dong, X.L., Halevy, A.Y., Jeffery, S.R., Ko, D., Yu, C.: Web-scale data integration: You can afford to pay as you go. In: Proc. of CIDR (2007)

    Google Scholar 

  3. Dong, X.L., Halevy, A.Y., Yu, C.: Data integration with uncertainty. VLDB J. 18(2), 469–500 (2009)

    Article  Google Scholar 

  4. Sarma, A.D., Dong, X., Halevy, A.Y.: Bootstrapping pay-as-you-go data integration systems. In: Proc. of SIGMOD (2008)

    Google Scholar 

  5. Akbarinia, R., Valduriez, P., Verger, G.: Efficient Evaluation of SUM Queries Over Probabilistic Data. TKDE (to appear, 2012)

    Google Scholar 

  6. Huhtala, Y., Kärkkäinen, J., Porkka, P., Toivonen, H.: Tane: An efficient algorithm for discovering functional and approximate dependencies. Comput. J. 42(2), 100–111 (1999)

    Article  MATH  Google Scholar 

  7. Wang, D.Z., Dong, X.L., Sarma, A.D., Franklin, M.J., Halevy, A.Y.: Functional dependency generation and applications in pay-as-you-go data integration systems. In: Proc. of WebDB (2009)

    Google Scholar 

  8. Davidson, I., Ravi, S.S.: Using instance-level constraints in agglomerative hierarchical clustering: theoretical and empirical results. Data Min. Knowl. Discov. 18(2), 257–282 (2009)

    Article  MathSciNet  Google Scholar 

  9. Manning, C., Raghavan, P., Schutze, H.: Introduction to information retrieval, vol. 1. Cambridge University Press, Cambridge (2008)

    Book  MATH  Google Scholar 

  10. Cohen, W.W., Ravikumar, P.D., Fienberg, S.E.: A comparison of string distance metrics for name-matching tasks. In: Proc. of IIWeb (2003)

    Google Scholar 

  11. Rahm, E., Bernstein, P.A.: A survey of approaches to automatic schema matching. VLDB J. 10(4), 334–350 (2001)

    Article  MATH  Google Scholar 

  12. Bhattacharjee, A., Jamil, H.M.: Ontomatch: A monotonically improving schema matching system for autonomous data integration. In: Proc. of Conference on Information Reuse & Integration (2009)

    Google Scholar 

  13. Palopoli, L., Terracina, G., Ursino, D.: Dike: a system supporting the semi-automatic construction of cooperative information systems from heterogeneous databases. Softw. Pract. Exper. 33(9), 847–884 (2003)

    Article  Google Scholar 

  14. Unal, O., Afsarmanesh, H.: Semi-automated schema integration with sasmint. Knowl. Inf. Syst. 23(1) (2010)

    Google Scholar 

  15. Biskup, J., Embley, D.W.: Extracting information from heterogeneous information sources using ontologically specified target views. Inf. Syst. 28(3), 169–212 (2003)

    Article  Google Scholar 

  16. Larson, J.A., Navathe, S.B., Elmasri, R.: A theory of attribute equivalence in databases with application to schema integration. IEEE Trans. Software Eng. 15(4), 449–463 (1989)

    Article  MATH  Google Scholar 

  17. Melnik, S., Garcia-Molina, H., Rahm, E.: Similarity flooding: A versatile graph matching algorithm and its application to schema matching. In: Proc. of ICDE (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ayat, N., Afsarmanesh, H., Akbarinia, R., Valduriez, P. (2012). An Uncertain Data Integration System. In: Meersman, R., et al. On the Move to Meaningful Internet Systems: OTM 2012. OTM 2012. Lecture Notes in Computer Science, vol 7566. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33615-7_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-33615-7_26

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-33614-0

  • Online ISBN: 978-3-642-33615-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics