An Uncertain Data Integration System

Ayat, Naser; Afsarmanesh, Hamideh; Akbarinia, Reza; Valduriez, Patrick

doi:10.1007/978-3-642-33615-7_26

Naser Ayat²⁶,
Hamideh Afsarmanesh²⁶,
Reza Akbarinia²⁷ &
…
Patrick Valduriez²⁷

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7566))

Included in the following conference series:

OTM Confederated International Conferences "On the Move to Meaningful Internet Systems"

986 Accesses

Abstract

Data integration systems offer uniform access to a set of autonomous and heterogeneous data sources. An important task in setting up a data integration system is to match the attributes of the source schemas. In this paper, we propose a data integration system which uses the knowledge implied within functional dependencies for matching the source schemas. We build our system on a probabilistic data model to capture the uncertainty arising during the matching process. Our performance validation confirms the importance of functional dependencies and also using a probabilistic data model in improving the quality of schema matching. Our experimental results show significant performance gain compared to the baseline approaches. They also show that our system scales well.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Uncertain Data Integration

Uncertain Schema Matching

References

Özsu, M.T., Valduriez, P.: Principles of Distributed Database Systems, 3rd edn. Springer (2011)
Google Scholar
Madhavan, J., Cohen, S., Dong, X.L., Halevy, A.Y., Jeffery, S.R., Ko, D., Yu, C.: Web-scale data integration: You can afford to pay as you go. In: Proc. of CIDR (2007)
Google Scholar
Dong, X.L., Halevy, A.Y., Yu, C.: Data integration with uncertainty. VLDB J. 18(2), 469–500 (2009)
Article Google Scholar
Sarma, A.D., Dong, X., Halevy, A.Y.: Bootstrapping pay-as-you-go data integration systems. In: Proc. of SIGMOD (2008)
Google Scholar
Akbarinia, R., Valduriez, P., Verger, G.: Efficient Evaluation of SUM Queries Over Probabilistic Data. TKDE (to appear, 2012)
Google Scholar
Huhtala, Y., Kärkkäinen, J., Porkka, P., Toivonen, H.: Tane: An efficient algorithm for discovering functional and approximate dependencies. Comput. J. 42(2), 100–111 (1999)
Article MATH Google Scholar
Wang, D.Z., Dong, X.L., Sarma, A.D., Franklin, M.J., Halevy, A.Y.: Functional dependency generation and applications in pay-as-you-go data integration systems. In: Proc. of WebDB (2009)
Google Scholar
Davidson, I., Ravi, S.S.: Using instance-level constraints in agglomerative hierarchical clustering: theoretical and empirical results. Data Min. Knowl. Discov. 18(2), 257–282 (2009)
Article MathSciNet Google Scholar
Manning, C., Raghavan, P., Schutze, H.: Introduction to information retrieval, vol. 1. Cambridge University Press, Cambridge (2008)
Book MATH Google Scholar
Cohen, W.W., Ravikumar, P.D., Fienberg, S.E.: A comparison of string distance metrics for name-matching tasks. In: Proc. of IIWeb (2003)
Google Scholar
Rahm, E., Bernstein, P.A.: A survey of approaches to automatic schema matching. VLDB J. 10(4), 334–350 (2001)
Article MATH Google Scholar
Bhattacharjee, A., Jamil, H.M.: Ontomatch: A monotonically improving schema matching system for autonomous data integration. In: Proc. of Conference on Information Reuse & Integration (2009)
Google Scholar
Palopoli, L., Terracina, G., Ursino, D.: Dike: a system supporting the semi-automatic construction of cooperative information systems from heterogeneous databases. Softw. Pract. Exper. 33(9), 847–884 (2003)
Article Google Scholar
Unal, O., Afsarmanesh, H.: Semi-automated schema integration with sasmint. Knowl. Inf. Syst. 23(1) (2010)
Google Scholar
Biskup, J., Embley, D.W.: Extracting information from heterogeneous information sources using ontologically specified target views. Inf. Syst. 28(3), 169–212 (2003)
Article Google Scholar
Larson, J.A., Navathe, S.B., Elmasri, R.: A theory of attribute equivalence in databases with application to schema integration. IEEE Trans. Software Eng. 15(4), 449–463 (1989)
Article MATH Google Scholar
Melnik, S., Garcia-Molina, H., Rahm, E.: Similarity flooding: A versatile graph matching algorithm and its application to schema matching. In: Proc. of ICDE (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Informatics Institute, University of Amsterdam, Amsterdam, Netherlands
Naser Ayat & Hamideh Afsarmanesh
INRIA and LIRMM, Montpellier, France
Reza Akbarinia & Patrick Valduriez

Authors

Naser Ayat
View author publications
You can also search for this author in PubMed Google Scholar
Hamideh Afsarmanesh
View author publications
You can also search for this author in PubMed Google Scholar
Reza Akbarinia
View author publications
You can also search for this author in PubMed Google Scholar
Patrick Valduriez
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Semantic Technology and Application Research Laboratory (STARLab), Vrije Universiteit Brussel, Building G-10, Pleinlaan 2, 1050, Brussels, Belgium
Robert Meersman
Research Centre for Automatic Control, School of Engineering in Information Technology, Campus scientifique, University of Lorraine, CNRS, BP 70239, 54506, Vandoeuvre-les-Nancy, France
Hervé Panetto
La Trobe University, Melbourne, VIC, Australia
Tharam Dillon
Faculty of Computer Science, University of Vienna, 1010, Vienna, Austria
Stefanie Rinderle-Ma
Institute of Databases and Information Systems, Ulm University, Germany
Peter Dadam
School of Information Technology and Electrical Engineering, University of Queensland, QLD 4072, Brisbane, Australia
Xiaofang Zhou
HP Labs, Bristol, UK
Siani Pearson
Johannes Kepler University, Linz, Austria
Alois Ferscha
Università di Modena e Reggio Emilia, Modena, Italy
Sonia Bergamaschi
ADVIS Lab, Department of Computer Science, University of Illinois at Chicago, Chicago, IL, USA
Isabel F. Cruz

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ayat, N., Afsarmanesh, H., Akbarinia, R., Valduriez, P. (2012). An Uncertain Data Integration System. In: Meersman, R., et al. On the Move to Meaningful Internet Systems: OTM 2012. OTM 2012. Lecture Notes in Computer Science, vol 7566. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33615-7_26

Download citation

DOI: https://doi.org/10.1007/978-3-642-33615-7_26
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33614-0
Online ISBN: 978-3-642-33615-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

An Uncertain Data Integration System

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Uncertain Data Integration

Uncertain Schema Matching

Uncertain Schema Matching

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us