Abstract
Schema matching is a crucial step in data integration. Many approaches to schema matching have been proposed. These approaches make use of different types of information about schemas, including structures, linguistic features and data types etc, to measure different types of similarity between the attributes of two schemas. They then combine different types of similarity and use combined similarity to select a collection of attribute correspondences for every source attribute. Thresholds are usually used for filtering out likely incorrect attribute correspondences, which have to be set manually and are matcher and domain dependent. A selection strategy is also used to resolve any conflicts between attribute correspondences of different source attributes. In this paper, we propose a new prioritized collective selection strategy that has two distinct characteristics. First, this strategy clusters a set of attribute correspondences into a number of clusters and collectively selects attribute correspondences from each of these clusters in a prioritized order. Second, it introduces use of a null correspondence for each source attribute, which represents the option that the source attribute has no attribute correspondence. By considering this option, our strategy does not need a threshold to filter out likely incorrect attribute correspondences. Our experimental results show that our approach is highly effective.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Shvaiko, P., Euzenat, J.: A survey of schema-based matching approaches. Journal of Data Semantics, 146–171 (2005)
Rahm, E., Bernstein, P.A.: A survey of approaches to automatic schema matching. VLDB Journal 10(4), 334–350 (2001)
He, B., Chang, K.C.C.: Statistical schema matching across web query interfaces. In: Proceedings of the 22th ACM International Conference on Management of Data (SIGMOD 2003), pp. 217–228 (2003)
He, B., Chang, K.C.C.: Automatic complex schema matching across Web query interfaces: A correlation mining approach. ACM Transactions on Database Systems (TODS) 31(1), 346–395 (2006)
Melnik, S., Garcia-Molina, H., Rahm, E.: Similarity flooding: A versatile graph matching algorithm and its application to schema matching. In: Proceedings of the 18th International Conference on Data Engineering (ICDE 2002), pp. 117–128 (2002)
He, B., Chang, K.C.C., Han, J.: Discovering complex matchings across web query interfaces: a correlation mining approach. In: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2004), pp. 148–157 (2004)
Wu, W., Yu, C.T., Doan, A., Meng, W.: An interactive clustering-based approach to integrating source query interfaces on the deep web. In: Proceedings of the 23rd ACM International Conference on Management of Data (SIGMOD 2004), pp. 95–106 (2004)
Madhavan, J., Bernstein, P.A., Rahm, E.: Generic schema matching with cupid. In: Proceedings of the 27th International Conference on Very Large Data Bases (VLDB 2001), pp. 49–58 (2001)
Wang, J., Wen, J.R., Lochovsky, F.H., Ma, W.Y.: Instance-based schema matching for web databases by domain-specific query probing. In: Proceedings of the 30th International Conference on Very Large Data Bases (VLDB 2004), pp. 408–419 (2004)
Do, H.H., Rahm, E.: Coma - a system for flexible combination of schema matching approaches. In: Proceedings of the 28th International Conference on Very Large Data Bases (VLDB 2002), pp. 610–621 (2002)
Doan, A., Domingos, P., Halevy, A.Y.: Reconciling schemas of disparate data sources: A machine-learning approach. In: Proceedings of the 20th ACM International Conference on Management of Data (SIGMOD 2001), pp. 509–520 (2001)
Hall, P., Dowling, G.: Approximate string matching. Computing Surveys, 381–402 (1980)
Halevy, A.Y., Madhavan, J.: Corpus-Based Knowledge Representation. In: Proceedings of the 18th International Joint Conference on Artificial Intelligence (IJCAI 2003), pp. 1567–1572 (2003)
Lovasz, L., Plummer, M.: Matching Theory. North-Holland, Amsterdam (1986)
He, Z., Hong, J., Bell, D.: Schema Matching across Query Interfaces on the Deep Web. In: Gray, A., Jeffery, K., Shao, J. (eds.) BNCOD 2008. LNCS, vol. 5071, pp. 51–62. Springer, Heidelberg (2008)
Gal, A.: Managing Uncertainty in Schema Matching with Top-K Schema Mappings. In: Spaccapietra, S., Aberer, K., Cudré-Mauroux, P. (eds.) Journal on Data Semantics VI. LNCS, vol. 4090, pp. 90–114. Springer, Heidelberg (2006)
Bilke, A., Naumann, F.: Schema Matching using Duplicates. In: Proceedings of the 21st International Conference on Data Engineering (ICDE 2005), pp. 69–80 (2005)
Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann, San Francisco (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
He, Z., Hong, J., Bell, D.A. (2009). A Prioritized Collective Selection Strategy for Schema Matching across Query Interfaces. In: Sexton, A.P. (eds) Dataspace: The Final Frontier. BNCOD 2009. Lecture Notes in Computer Science, vol 5588. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02843-4_5
Download citation
DOI: https://doi.org/10.1007/978-3-642-02843-4_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-02842-7
Online ISBN: 978-3-642-02843-4
eBook Packages: Computer ScienceComputer Science (R0)