Skip to main content

A Prioritized Collective Selection Strategy for Schema Matching across Query Interfaces

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5588))

Abstract

Schema matching is a crucial step in data integration. Many approaches to schema matching have been proposed. These approaches make use of different types of information about schemas, including structures, linguistic features and data types etc, to measure different types of similarity between the attributes of two schemas. They then combine different types of similarity and use combined similarity to select a collection of attribute correspondences for every source attribute. Thresholds are usually used for filtering out likely incorrect attribute correspondences, which have to be set manually and are matcher and domain dependent. A selection strategy is also used to resolve any conflicts between attribute correspondences of different source attributes. In this paper, we propose a new prioritized collective selection strategy that has two distinct characteristics. First, this strategy clusters a set of attribute correspondences into a number of clusters and collectively selects attribute correspondences from each of these clusters in a prioritized order. Second, it introduces use of a null correspondence for each source attribute, which represents the option that the source attribute has no attribute correspondence. By considering this option, our strategy does not need a threshold to filter out likely incorrect attribute correspondences. Our experimental results show that our approach is highly effective.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Shvaiko, P., Euzenat, J.: A survey of schema-based matching approaches. Journal of Data Semantics, 146–171 (2005)

    Google Scholar 

  2. Rahm, E., Bernstein, P.A.: A survey of approaches to automatic schema matching. VLDB Journal 10(4), 334–350 (2001)

    Article  MATH  Google Scholar 

  3. He, B., Chang, K.C.C.: Statistical schema matching across web query interfaces. In: Proceedings of the 22th ACM International Conference on Management of Data (SIGMOD 2003), pp. 217–228 (2003)

    Google Scholar 

  4. He, B., Chang, K.C.C.: Automatic complex schema matching across Web query interfaces: A correlation mining approach. ACM Transactions on Database Systems (TODS) 31(1), 346–395 (2006)

    Article  Google Scholar 

  5. Melnik, S., Garcia-Molina, H., Rahm, E.: Similarity flooding: A versatile graph matching algorithm and its application to schema matching. In: Proceedings of the 18th International Conference on Data Engineering (ICDE 2002), pp. 117–128 (2002)

    Google Scholar 

  6. He, B., Chang, K.C.C., Han, J.: Discovering complex matchings across web query interfaces: a correlation mining approach. In: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2004), pp. 148–157 (2004)

    Google Scholar 

  7. Wu, W., Yu, C.T., Doan, A., Meng, W.: An interactive clustering-based approach to integrating source query interfaces on the deep web. In: Proceedings of the 23rd ACM International Conference on Management of Data (SIGMOD 2004), pp. 95–106 (2004)

    Google Scholar 

  8. Madhavan, J., Bernstein, P.A., Rahm, E.: Generic schema matching with cupid. In: Proceedings of the 27th International Conference on Very Large Data Bases (VLDB 2001), pp. 49–58 (2001)

    Google Scholar 

  9. Wang, J., Wen, J.R., Lochovsky, F.H., Ma, W.Y.: Instance-based schema matching for web databases by domain-specific query probing. In: Proceedings of the 30th International Conference on Very Large Data Bases (VLDB 2004), pp. 408–419 (2004)

    Google Scholar 

  10. Do, H.H., Rahm, E.: Coma - a system for flexible combination of schema matching approaches. In: Proceedings of the 28th International Conference on Very Large Data Bases (VLDB 2002), pp. 610–621 (2002)

    Google Scholar 

  11. Doan, A., Domingos, P., Halevy, A.Y.: Reconciling schemas of disparate data sources: A machine-learning approach. In: Proceedings of the 20th ACM International Conference on Management of Data (SIGMOD 2001), pp. 509–520 (2001)

    Google Scholar 

  12. Hall, P., Dowling, G.: Approximate string matching. Computing Surveys, 381–402 (1980)

    Google Scholar 

  13. Halevy, A.Y., Madhavan, J.: Corpus-Based Knowledge Representation. In: Proceedings of the 18th International Joint Conference on Artificial Intelligence (IJCAI 2003), pp. 1567–1572 (2003)

    Google Scholar 

  14. Lovasz, L., Plummer, M.: Matching Theory. North-Holland, Amsterdam (1986)

    MATH  Google Scholar 

  15. He, Z., Hong, J., Bell, D.: Schema Matching across Query Interfaces on the Deep Web. In: Gray, A., Jeffery, K., Shao, J. (eds.) BNCOD 2008. LNCS, vol. 5071, pp. 51–62. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  16. Gal, A.: Managing Uncertainty in Schema Matching with Top-K Schema Mappings. In: Spaccapietra, S., Aberer, K., Cudré-Mauroux, P. (eds.) Journal on Data Semantics VI. LNCS, vol. 4090, pp. 90–114. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  17. Bilke, A., Naumann, F.: Schema Matching using Duplicates. In: Proceedings of the 21st International Conference on Data Engineering (ICDE 2005), pp. 69–80 (2005)

    Google Scholar 

  18. Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann, San Francisco (2006)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

He, Z., Hong, J., Bell, D.A. (2009). A Prioritized Collective Selection Strategy for Schema Matching across Query Interfaces. In: Sexton, A.P. (eds) Dataspace: The Final Frontier. BNCOD 2009. Lecture Notes in Computer Science, vol 5588. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02843-4_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-02843-4_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-02842-7

  • Online ISBN: 978-3-642-02843-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics