A Prioritized Collective Selection Strategy for Schema Matching across Query Interfaces

He, Zhongtian; Hong, Jun; Bell, David A.

doi:10.1007/978-3-642-02843-4_5

Zhongtian He¹⁷,
Jun Hong¹⁷ &
David A. Bell¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5588))

Included in the following conference series:

British National Conference on Databases

994 Accesses

Abstract

Schema matching is a crucial step in data integration. Many approaches to schema matching have been proposed. These approaches make use of different types of information about schemas, including structures, linguistic features and data types etc, to measure different types of similarity between the attributes of two schemas. They then combine different types of similarity and use combined similarity to select a collection of attribute correspondences for every source attribute. Thresholds are usually used for filtering out likely incorrect attribute correspondences, which have to be set manually and are matcher and domain dependent. A selection strategy is also used to resolve any conflicts between attribute correspondences of different source attributes. In this paper, we propose a new prioritized collective selection strategy that has two distinct characteristics. First, this strategy clusters a set of attribute correspondences into a number of clusters and collectively selects attribute correspondences from each of these clusters in a prioritized order. Second, it introduces use of a null correspondence for each source attribute, which represents the option that the source attribute has no attribute correspondence. By considering this option, our strategy does not need a threshold to filter out likely incorrect attribute correspondences. Our experimental results show that our approach is highly effective.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

An Efficient Holistic Schema Matching Approach

Towards a Holistic Schema Matching Approach Designed for Large-Scale Schemas

Two Phase User Driven Schema Matching

References

Shvaiko, P., Euzenat, J.: A survey of schema-based matching approaches. Journal of Data Semantics, 146–171 (2005)
Google Scholar
Rahm, E., Bernstein, P.A.: A survey of approaches to automatic schema matching. VLDB Journal 10(4), 334–350 (2001)
Article MATH Google Scholar
He, B., Chang, K.C.C.: Statistical schema matching across web query interfaces. In: Proceedings of the 22th ACM International Conference on Management of Data (SIGMOD 2003), pp. 217–228 (2003)
Google Scholar
He, B., Chang, K.C.C.: Automatic complex schema matching across Web query interfaces: A correlation mining approach. ACM Transactions on Database Systems (TODS) 31(1), 346–395 (2006)
Article Google Scholar
Melnik, S., Garcia-Molina, H., Rahm, E.: Similarity flooding: A versatile graph matching algorithm and its application to schema matching. In: Proceedings of the 18th International Conference on Data Engineering (ICDE 2002), pp. 117–128 (2002)
Google Scholar
He, B., Chang, K.C.C., Han, J.: Discovering complex matchings across web query interfaces: a correlation mining approach. In: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2004), pp. 148–157 (2004)
Google Scholar
Wu, W., Yu, C.T., Doan, A., Meng, W.: An interactive clustering-based approach to integrating source query interfaces on the deep web. In: Proceedings of the 23rd ACM International Conference on Management of Data (SIGMOD 2004), pp. 95–106 (2004)
Google Scholar
Madhavan, J., Bernstein, P.A., Rahm, E.: Generic schema matching with cupid. In: Proceedings of the 27th International Conference on Very Large Data Bases (VLDB 2001), pp. 49–58 (2001)
Google Scholar
Wang, J., Wen, J.R., Lochovsky, F.H., Ma, W.Y.: Instance-based schema matching for web databases by domain-specific query probing. In: Proceedings of the 30th International Conference on Very Large Data Bases (VLDB 2004), pp. 408–419 (2004)
Google Scholar
Do, H.H., Rahm, E.: Coma - a system for flexible combination of schema matching approaches. In: Proceedings of the 28th International Conference on Very Large Data Bases (VLDB 2002), pp. 610–621 (2002)
Google Scholar
Doan, A., Domingos, P., Halevy, A.Y.: Reconciling schemas of disparate data sources: A machine-learning approach. In: Proceedings of the 20th ACM International Conference on Management of Data (SIGMOD 2001), pp. 509–520 (2001)
Google Scholar
Hall, P., Dowling, G.: Approximate string matching. Computing Surveys, 381–402 (1980)
Google Scholar
Halevy, A.Y., Madhavan, J.: Corpus-Based Knowledge Representation. In: Proceedings of the 18th International Joint Conference on Artificial Intelligence (IJCAI 2003), pp. 1567–1572 (2003)
Google Scholar
Lovasz, L., Plummer, M.: Matching Theory. North-Holland, Amsterdam (1986)
MATH Google Scholar
He, Z., Hong, J., Bell, D.: Schema Matching across Query Interfaces on the Deep Web. In: Gray, A., Jeffery, K., Shao, J. (eds.) BNCOD 2008. LNCS, vol. 5071, pp. 51–62. Springer, Heidelberg (2008)
Chapter Google Scholar
Gal, A.: Managing Uncertainty in Schema Matching with Top-K Schema Mappings. In: Spaccapietra, S., Aberer, K., Cudré-Mauroux, P. (eds.) Journal on Data Semantics VI. LNCS, vol. 4090, pp. 90–114. Springer, Heidelberg (2006)
Chapter Google Scholar
Bilke, A., Naumann, F.: Schema Matching using Duplicates. In: Proceedings of the 21st International Conference on Data Engineering (ICDE 2005), pp. 69–80 (2005)
Google Scholar
Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann, San Francisco (2006)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

School of Electronics, Electrical Engineering and Computer Science, Queen’s University Belfast, Belfast, BT7 1NN, UK
Zhongtian He, Jun Hong & David A. Bell

Authors

Zhongtian He
View author publications
You can also search for this author in PubMed Google Scholar
Jun Hong
View author publications
You can also search for this author in PubMed Google Scholar
David A. Bell
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computer Science, University of Birmingham, B15 2TT, Edgbaston, Birmingham, UK
Alan P. Sexton

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

He, Z., Hong, J., Bell, D.A. (2009). A Prioritized Collective Selection Strategy for Schema Matching across Query Interfaces. In: Sexton, A.P. (eds) Dataspace: The Final Frontier. BNCOD 2009. Lecture Notes in Computer Science, vol 5588. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02843-4_5

Download citation

DOI: https://doi.org/10.1007/978-3-642-02843-4_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-02842-7
Online ISBN: 978-3-642-02843-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics